Turn concepts into animated explainer videos using Manim (Python) with MP4/GIF output, audio overlay, multi-scene composition. Triggers on: "create a video", "animate this", "make an explainer", "manim animation", "motion graphic". NOT for React video, use remotion-video.
Manim is the "SVG of video" — you write Python code that describes animations declaratively, and it renders to MP4/GIF at any resolution. The Python scene file IS the editable intermediate: the user can see the code, request changes ("make the arrows red", "add a third step", "slow down the transition"), and only do a final high-quality render once satisfied. This makes the workflow iterative and controllable, exactly like concept-to-image uses HTML as an intermediate.
Workflow
Concept → Manim scene (.py) → Preview (low-quality) → Iterate → Final render (MP4/GIF)
Interpret the user's concept — determine the best animation approach
Design a self-contained Manim scene file — one file, one Scene class
Preview by rendering at low quality (-ql) for fast iteration
Iterate on the scene based on user feedback
Export final video at high quality using scripts/render_video.py
Step 0: Ensure dependencies
Before writing any scene, ensure Manim is installed:
When using a template: copy it to the working directory, edit the config constants at the top, do not restructure the class.
Core rules:
Single file, single Scene class: Everything in one .py file with one class XxxScene(Scene).
Self-contained: No external assets unless absolutely necessary. Use Manim primitives for everything.
Readable code: The scene file IS the user's artifact. Use clear variable names, comments for each animation beat.
Color with intention: Use Manim's color constants (BLUE, RED, GREEN, YELLOW, etc.) or hex colors. Max 4-5 colors. Every color should encode meaning.
Pacing: Include self.wait() calls between logical sections. 0.5s for breathing room, 1-2s for major transitions.
Text legibility: Use font_size=36 minimum for body text, font_size=48+ for titles. Test at target resolution.
Scene dimensions: Default Manim canvas is 14.2 × 8 units (16:9). Keep content within ±6 horizontal, ±3.5 vertical.
Animation best practices
# DO: Use animation groups for simultaneous effects
self.play(FadeIn(box), Write(label), run_time=1)
# DO: Use .animate syntax for property changes
self.play(box.animate.shift(RIGHT * 2).set_color(GREEN))
# DO: Stagger related elements
self.play(LaggedStart(*[FadeIn(item) for item in items], lag_ratio=0.2))
# DON'T: Add/remove without animation (jarring)
self.add(box) # Only for setup before first frame
# DON'T: Make animations too fast
self.play(Transform(a, b), run_time=0.3) # Too fast to read
Structure template
from manim import *
class ConceptScene(Scene):
def construct(self):
# === Section 1: Title / Setup ===
title = Text("Concept Name", font_size=56, weight=BOLD)
self.play(Write(title))
self.wait(1)
self.play(FadeOut(title))
# === Section 2: Core animation ===
# ... main content here ...
# === Section 3: Summary / Conclusion ===
# ... wrap-up animation ...
self.wait(2)
This renders at 480p/15fps — fast enough for previewing timing and layout. Present the video to the user.
Step 4: Iterate
Common refinement requests and how to handle them:
Request
Action
"Slower/faster"
Adjust run_time= params and self.wait() durations
"Change colors"
Update color constants
"Add a step"
Insert new animation block between sections
"Reorder"
Move code blocks around
"Different layout"
Adjust .shift(), .next_to(), .arrange() calls
"Add labels/annotations"
Add Text or MathTex objects with .next_to()
"Make it loop"
Add matching intro/outro states
Step 5: Final export
Once the user is satisfied:
python3 scripts/render_video.py scene.py ConceptScene --quality high --format mp4
Quality presets
Preset
Resolution
FPS
Flag
Use case
low
480p
15
-ql
Fast preview
medium
720p
30
-qm
Draft review
high
1080p
60
-qh
Final delivery
4k
2160p
60
-qk
Presentation quality
Format options
Format
Flag
Use case
mp4
--format mp4
Standard video delivery
gif
--format gif
Embeddable in docs, social
webm
--format webm
Web-optimized
Delivering the output
Present both:
The .py scene file (for future editing)
The rendered video file (final output)
Copy the final video to /mnt/user-data/outputs/ and present it.
Step 5.5: Optional audio overlay
If the user provides audio (music or voiceover), or requests it:
# Background music at 25% volume with fade-in/out
python3 scripts/add_audio.py final.mp4 music.mp3 \
--output final_with_audio.mp4 \
--volume 0.25 --fade-in 2 --fade-out 3 --trim-to-video
# Voiceover at full volume, trimmed to video length
python3 scripts/add_audio.py final.mp4 voiceover.mp3 \
--output final_narrated.mp4 --trim-to-video
For voiceover scripting before recording, read references/rules/voiceover-scaffold.md.
For subtitles/captions, read references/rules/subtitles.md.
For advanced multi-track mixing, read references/rules/audio-overlay.md.
Error Handling
Error
Cause
Resolution
ModuleNotFoundError: manim
Manim not installed
Run Step 0 setup commands
pangocairo build error
Missing system dev headers
apt-get install -y libpango1.0-dev
FileNotFoundError: ffmpeg
ffmpeg not installed
apt-get install -y ffmpeg
Scene class not found
Class name mismatch
Verify class name matches CLI argument
Overlapping objects
Positions not calculated
Use .next_to(), .arrange(), explicit .shift() calls
Text cut off
Text too large or positioned near edge
Reduce font_size or adjust position within ±6,±3.5
Slow render
Too many objects or complex transformations
Reduce object count, simplify paths, use lower quality
LaTeX Error
LaTeX not installed (for MathTex)
Use Text instead, or install texlive-latex-base
LaTeX fallback
If LaTeX is not available, avoid MathTex and Tex. Use Text with Unicode math symbols instead:
Single-shot mode (default) is fast and cheap — the coder writes scene.py directly from a concept. Use agentic mode for production-quality renders where layout correctness and asset resolution matter enough to justify additional LLM and VLM calls.
Pipeline
concept
└─► plan_storyboard.py ──► storyboard.json
│
▼
fetch_assets.py (optional)
│
▼
coder writes scene.py
│
▼
render_video.py --max-fix-attempts N
│ ▲
│ └─ LLM fixup loop (on failure, up to N retries)
▼
critic_pass.py --critic
│ ▲
│ └─ VLM layout patch (1 call with M image blocks)
▼
final MP4
Flag Reference
Script
Flag
Default
Hard cap
Effect
Cost impact
render_video.py
--max-fix-attempts
0
3
LLM-assisted auto-fix on render failure; 0 = disabled
+1 LLM call per retry
critic_pass.py
--critic
disabled
—
Enable the VLM critic pass; noop without this flag
+1 VLM call (N image blocks)
critic_pass.py
--critic-budget
50000
—
Token budget for critic call; aborts loudly if exceeded
Sets ceiling; use to prevent runaway spend
critic_pass.py
--frames
5
10
Frames sampled from the rendered video for the critic
More frames → higher token cost per critic run
fetch_assets.py
--adapter
none
—
Asset backend: local, iconfinder, none
iconfinder adds external API calls
fetch_assets.py
--asset-dir
—
—
Root directory for --adapter=local; required with local
None
Cost Tradeoffs
The fixup loop adds one LLM call per failed render attempt — with --max-fix-attempts 3 you may pay up to 3 extra calls before the loop exhausts or succeeds. The critic pass adds one VLM call containing N PNG image blocks (default 5, max 10); each frame adds roughly 1 token per 800 bytes of base64-encoded PNG, so complex scenes at high resolution are materially more expensive. Setting --critic-budget to a conservative token ceiling (e.g. 20000) causes BudgetExceededError before the API call is made, so you never pay for an accidentally oversized request — the error is loud and non-recoverable by design.
Agentic pipeline design (storyboard planner, auto-fix loop, VLM critic) is adapted from Code2Video (arXiv 2510.01174, MIT). Vendored prompt templates live in references/code2video/ alongside the upstream LICENSE. Full vendoring record, pinned commit, and re-sync policy are tracked in root ATTRIBUTIONS.md.
Limitations
Manim + ffmpeg required — cannot render without these dependencies.
Audio is post-render only — Manim renders silent MP4s. Use scripts/add_audio.py to overlay audio after export.
LaTeX optional — MathTex requires a LaTeX installation. Fall back to Text with Unicode for math.
Render time scales with complexity — a 30-second 1080p scene with many objects can take 1-2 minutes to render.
3D scenes require OpenGL — ThreeDScene may not work in headless containers. Stick to 2D Scene class.
No interactivity — output is a static video file, not an interactive widget.
GIF output is silent — audio overlay only works with MP4/WEBM output formats.
Design anti-patterns to avoid
Walls of text on screen — keep to 3-5 words per label, max 2 lines
Everything appearing at once — use staged animations with LaggedStart
Uniform timing — vary run_time to create rhythm (fast for simple, slow for important)
No visual hierarchy — use size, color, and position to guide attention
Rainbow colors — 3-4 intentional colors max
Ignoring the grid — align objects to consistent positions using arrange/align