AI video generation with LTX-2.3 22B — text-to-video, image-to-video clips for video production. Use when generating video clips, animating images, creating b-roll, animated backgrounds, or motion content. Triggers include video generation, animate image, b-roll, motion, video clip, text-to-video, image-to-video.
Generate ~5 second video clips from text prompts or images using the LTX-2.3 22B DiT model.
Runs on Modal (A100-80GB). Requires MODAL_LTX2_ENDPOINT_URL in .env.
# Text-to-video
python3 tools/ltx2.py --prompt "A sunset over the ocean, golden light on waves, cinematic" --output sunset.mp4
# Image-to-video (animate a still image)
python3 tools/ltx2.py --prompt "Gentle camera drift, soft ambient motion" --input photo.jpg --output animated.mp4
# Custom resolution and duration
python3 tools/ltx2.py --prompt "..." --width 1024 --height 576 --num-frames 161 --output wide.mp4
# Fast mode (fewer steps, quicker)
python3 tools/ltx2.py --prompt "..." --quality fast --output quick.mp4
# Reproducible output
python3 tools/ltx2.py --prompt "..." --seed 42 --output reproducible.mp4
| Parameter | Default | Description |
|---|
--prompt | (required) | Text description of the video |
--input | - | Input image for image-to-video |
--width | 768 | Video width (divisible by 64) |
--height | 512 | Video height (divisible by 64) |
--num-frames | 121 | Frame count, must satisfy (n-1) % 8 == 0 |
--fps | 24 | Frames per second |
--quality | standard | standard (30 steps) or fast (15 steps) |
--steps | 30 | Override inference steps directly |
--seed | random | Seed for reproducibility |
--output | auto | Output file path |
--negative-prompt | sensible default | What to avoid |
(n - 1) % 8 == 0: 25 (~1s), 49 (~2s), 73 (~3s), 97 (~4s), 121 (~5s default), 161 (~6.7s), 193 (~8s max practical).
| Resolution | Ratio | Notes |
|---|---|---|
| 768x512 | 3:2 | Default, good balance |
| 512x512 | 1:1 | Square, fastest |
| 1024x576 | 16:9 | Widescreen |
| 576x1024 | 9:16 | Portrait/vertical |
LTX-2 responds well to cinematographic descriptions. Layer these dimensions:
Keep prompts under 200 words. Be specific about the scene.
# Atmospheric b-roll
"Aerial drone shot slowly flying over turquoise ocean waves breaking on white sand, golden hour sunlight, cinematic"
# Product/tech scene
"Close-up of hands typing on a mechanical keyboard, shallow depth of field, soft desk lamp lighting, cozy atmosphere"
# Abstract background
"Dark moody abstract background with flowing blue light streaks, subtle geometric grid, bokeh particles floating, cinematic tech atmosphere"
# Animate a portrait
"Professional headshot, subtle natural head movement, confident warm expression, studio lighting, shallow depth of field"
# Animate a slide/screenshot
"Gentle subtle particle effects floating across a presentation slide, soft ambient light shifts, very slight camera drift"
# Too vague
"A cool video"
# Too many competing ideas
"A cat riding a skateboard while juggling fire on the moon during a thunderstorm"
# Describing text/UI (model can't render text reliably)
"A website showing the text 'Welcome to our platform'"
Generate atmospheric 5s shots for cutaways between narrated scenes:
python3 tools/ltx2.py --prompt "Futuristic holographic interface, glowing data visualizations, clean workspace, cinematic" --output broll_tech.mp4
python3 tools/ltx2.py --prompt "Aerial view of European city at golden hour, modern architecture" --output broll_europe.mp4
Feed a slide screenshot and add subtle motion:
python3 tools/ltx2.py --prompt "Gentle particle effects, soft ambient light shifts, very slight camera drift" --input slide.png --output animated_slide.mp4
Bring still headshots to life:
python3 tools/ltx2.py --prompt "Subtle natural head movement, warm expression, professional lighting" --input headshot.png --output animated_portrait.mp4
Generate abstract motion backgrounds for title cards:
python3 tools/ltx2.py --prompt "Dark moody background with flowing blue and coral light streaks, bokeh particles, cinematic tech atmosphere, no text" --output intro_bg.mp4
LTX-2 generates raw clips. Combine with the rest of the toolkit:
| Workflow | Tools |
|---|---|
| Generate clip → upscale | ltx2.py → upscale.py |
| Generate clip → add to Remotion | ltx2.py → use as <OffthreadVideo> in composition |
| Generate image → animate | flux2.py → ltx2.py --input |
| Generate clip → extract audio | ltx2.py → ffmpeg -i clip.mp4 -vn audio.wav |
| Generate clip → add voiceover | ltx2.py → mix with qwen3_tts.py output |
--seed.# 1. Create Modal secret for HuggingFace (one-time)
modal secret create huggingface-token HF_TOKEN=hf_your_token
# 2. Deploy (downloads ~55GB of weights, takes ~10 min)
modal deploy docker/modal-ltx2/app.py
# 3. Save endpoint URL to .env
echo "MODAL_LTX2_ENDPOINT_URL=https://yourname--video-toolkit-ltx2-ltx2-generate.modal.run" >> .env
# 4. Test
python3 tools/ltx2.py --prompt "A candle flickering on a dark table, cinematic" --output test.mp4
Important: HuggingFace token needs read-access scope. Accept the Gemma 3 license before deploying. Unauthenticated downloads are severely rate-limited.