Create academic presentation slide decks and optionally demo videos from research papers. Use when the user asks to "make slides", "create a deck", "make a presentation", "demo video", "paper slides", "conference talk slides", or wants to turn a paper into a visual presentation. Covers slide generation, narration scripts, TTS audio, and video assembly.
Produce slide decks (and optionally narrated demo videos) from research papers. The human drives all outline and visual decisions — the agent executes.
[1] Script Draft ──→ [2] Slide Generation ──→ [3] TTS Audio (optional) ──→ [4] Video Assembly (optional)
Claude Code nanobanana /edit edge-tts / Kokoro / ElevenLabs ffmpeg
Skip stages 3–4 for slide-only output. User can enter at any stage.
Input: paper + user-provided outline or slide plan
Output: video-scripts.md or slide-outline.md — per-slide content with talking points
The agent drafts scripts based on the user's outline. The user owns the structure — agent does not decide slide count, order, or what to emphasize.
Full reference: references/slide-generation.md
Tool: nanobanana (Gemini CLI extension)
Priority order (edit-first):
/edit to wrap into slide frame/edit to adapt/edit to refineKey principle: prefer /edit on existing HQ paper figures over generating from scratch.
Deck style: create deck-style.md once per deck, prepend to all generate-from-scratch prompts. For /edit, style is inherited from the base image.
Example deck-style.md:
- Canvas: 1920x1080, white background
- Accent: #2563EB blue, text: #1e293b dark slate
- Clean sans-serif, flat design, no gradients/shadows
- Bottom bar: blue accent with white affiliation text
Full reference: references/tts-engines.md Batch scripts: scripts/batch_tts_edge.py, scripts/batch_tts_kokoro.py
Output: one audio file per narrated slide
| Engine | Quality | Cost | Latency | Best For |
|---|---|---|---|---|
| edge-tts (default) | Very good | Free, unlimited | ~6s/slide (cloud) | Quick generation, good male voices |
| Kokoro | Very good | Free, unlimited | ~1.5s/slide (local) | Offline use, fast batch, good female voices |
| ElevenLabs | Premium | 10k chars free/mo | ~3s/slide (cloud) | Highest quality, voice cloning |
Default: Use edge-tts unless user requests offline or premium quality.
import edge_tts, asyncio
async def tts_slide(text, output, voice="en-US-AndrewNeural"):
await edge_tts.Communicate(text, voice).save(output)
asyncio.run(tts_slide("Your slide text here", "slide_01.mp3"))
Voices: AndrewNeural (male, presenter), AriaNeural (female), GuyNeural (male, warm), JennyNeural (female, pro)
Tool: ffmpeg Input: slide PNGs + audio files + optional demo recording
# Use symlink to avoid iCloud path spaces: ln -sfn "long path" /tmp/workdir
# Slide with audio:
ffmpeg -y -loop 1 -i slide.png -i audio.mp3 \
-c:v libx264 -tune stillimage -pix_fmt yuv420p \
-c:a aac -ar 44100 -ac 2 -shortest seg.mp4
# Silent slide (N seconds):
ffmpeg -y -loop 1 -i slide.png -f lavfi -i anullsrc=r=44100:cl=stereo \
-c:v libx264 -tune stillimage -pix_fmt yuv420p \
-c:a aac -ar 44100 -ac 2 -t N seg.mp4
# Concat (always re-encode, never -c copy):
printf "file 'seg1.mp4'\nfile 'seg2.mp4'\n..." > concat.txt
ffmpeg -y -f concat -safe 0 -i concat.txt \
-c:v libx264 -pix_fmt yuv420p -c:a aac -ar 44100 -ac 2 final.mp4
All segments MUST share: 44100Hz sample rate, stereo, AAC codec.
Full reference: references/pptx-conversion.md
If starting from an existing PPTX, convert slides to PNG images first:
soffice --headless --convert-to pdf --outdir output/ presentation.pptx
pdftoppm -png -r 300 output/presentation.pdf output/slide
The agent must NOT auto-invoke NotebookLM or use its outputs to drive slide/script decisions. The human owns the outline, visual arrangement, and deck direction.
When to recommend: only when the user says they're unsure what to put on slides or need inspiration.
/tmp/-ar 44100 -ac 2mp3_22050_32 only, 10k chars/month/edit distorts figure — be more explicit: "Keep the original figure exactly as-is, only add framing"/edit from base slide or prepend shared deck-style.md| Tool | Stage | Install |
|---|---|---|
| Gemini CLI + nanobanana | 2 | gemini extensions install https://github.com/gemini-cli-extensions/nanobanana |
| LibreOffice + poppler | 2 (PPTX) | brew install --cask libreoffice && brew install poppler |
| edge-tts | 3 | pip install edge-tts |
| Kokoro | 3 (offline) | pip install kokoro soundfile |
| ElevenLabs | 3 (premium) | pip install elevenlabs + ELEVENLABS_API_KEY |
| ffmpeg | 4 | brew install ffmpeg |