Record terminal demo videos of the polli CLI using VHS (charmbracelet) with polli-generated voiceover and background music. Ships a working `example/demo.tape` + `example/render.sh` pipeline. Use when the user asks to make, record, or iterate on a polli demo video, LinkedIn/social clip of the CLI, or any terminal screencast for pollinations.
VHS records silent terminal video. Polli generates audio. ffmpeg merges. VHS cannot capture system audio — either capture it live via BlackHole, or overlay narration in post with adelay.
.tape scriptmkdir -p temp/vhs-demo && cp .claude/skills/polli-video/example/* temp/vhs-demo/
cd temp/vhs-demo
# Set system default output → BlackHole 2ch (see Pipeline A)
./render.sh
Produces demo.mp4. Edit demo.tape to change content.
Single command, no adelay math, perfect sync. Requires BlackHole 2ch and default system output set to it (Option+click menu-bar speaker, or SwitchAudioSource -s "BlackHole 2ch").
See example/render.sh — archives old renders, captures while VHS runs, muxes trimmed to video length with a 5s audio pre-roll shift (ffmpeg startup + API roundtrip offset) and a 3s audio fade-out at the end. Copy and adapt.
Capture tool: sox over ffmpeg avfoundation. ffmpeg dropped samples on long recordings (choppy audio) even with -thread_queue_size 4096, -async 1, nice -20. sox + coreaudio is stable end-to-end, addresses device by name ("BlackHole 2ch") instead of a flaky index, and needs no post-trim:
AUDIODRIVER=coreaudio sox -q -c 2 -r 48000 -t coreaudio "BlackHole 2ch" \
-c 2 -r 48000 -b 16 captured.wav 2>/tmp/sox-capture.log &
Music gen must use --play so BlackHole captures it. Switch default output back to Speakers after.
When BlackHole unavailable. Pre-generate narration + music, overlay with manual timestamps:
vhs demo.tape # produces demo-silent.mp4
ffmpeg -y -i demo-silent.mp4 -i speech.mp3 -i music.mp3 \
-filter_complex "[1:a]adelay=16000|16000[narr]; \
[2:a]volume=0.18,atrim=end=75[bg]; \
[narr][bg]amix=inputs=2:duration=longest[a]" \
-map 0:v -map "[a]" -c:v copy -c:a aac demo.mp4
adelay=16000|16000 = ms, left|right (must be stereo pair). Tune to tape timestamp where narration should hit. Ducking: 0.12–0.20 under narration, 0.25–0.35 music-only. Never pass -shortest — truncates video to shortest audio.
Set Shell "bash" is mandatory. zsh + long Type lines triggers a VHS command-concatenation bug: two consecutive long commands get glued onto a single prompt line and the Enter between them is lost. Root cause is zsh's ZLE + VHS's wall-clock-timed CDP keystroke injection (no prompt-wait primitive). Bash's simpler line editor has no such race. Not fixable with Wait+Line (regex fails on colored prompts), precmd hooks (VHS #691), longer Sleeps, or Ctrl+L substitution. Just use bash.polli gen text --model X "...".Hide-block cd; relative path in Output (parser splits on /).export FORCE_COLOR=1 in Hide — VHS pty fails isTTY, chalk strips colors otherwise.\033[1;38;5;141m = polli purple.announce() pattern: one function that prints the bold title AND fires backgrounded TTS. Every later scene is a one-liner. See example/demo.tape.Canonical bash Hide block (copy verbatim — fixes job notices, heredoc prompts, prompt trailing-line):
Set Shell "bash"
Hide
Type "cd /absolute/path/to/working/dir"
Enter
Type "export FORCE_COLOR=1"
Enter
Type "export PS1='» '"
Enter
Type "export PS2='» '"
Enter
Type "PROMPT_COMMAND='echo'"
Enter
Type "set +m"
Enter
Type "clear"
Enter
Show
PS2='» ' — hides heredoc/continuation > prompts that leak on wrapped Type lines.PROMPT_COMMAND='echo' — trailing blank line between scenes so output doesn't crash into the next prompt.set +m — disables job-control, suppresses [1] 12345 notices from & backgrounding.( cmd & ) never prints a PID even in interactive shells.announce functionOne-liner per scene: prints a bold purple title AND fires backgrounded TTS. The sed strip removes ElevenLabs emotion cues ([whispers], [excited], …) from the printed title while keeping them in the spoken audio.
announce() {
printf '\n\033[1;38;5;141m»» %s\033[0m\n\n' "$(sed -E 's/\[[^]]*\] *//g' <<< "$1")"
( polli gen audio --play --output /tmp/narr-$$-$RANDOM.mp3 "$1" >/dev/null 2>&1 & )
}
Usage:
Type `announce "[excited] first — we need some backing music"`
Enter
ElevenLabs emotion cues (must use ElevenLabs-backed voice, e.g. default sage): [whispers], [excited], [confident], [curious], [sighs], [laughs]. Place inline — the TTS engine interprets them; the sed strip keeps the on-screen text clean.
Why subshell ( cmd & ), not bare & or disown:
& in a non-interactive bash (VHS) still prints [1] 12345 — even with set +m, occasional leakage.disown after & still shows the job-start notice.( cmd & ) spawns the background job inside a subshell that exits immediately, so no job record is ever attached to the parent shell — nothing to print.| Symptom | Cause | Fix |
|---|---|---|
Two commands merge on one line, Enter eaten | zsh + long Type (ZLE race) | Set Shell "bash" |
heredoc> or > prompts visible | Wrapped line, zsh/bash PROMPT2 | export PS2='» ' |
[1] 12345 leaks on screen | Bash job-control notice | set +m + ( cmd & ) subshell |
| Scene collides with previous output | No trailing newline | PROMPT_COMMAND='echo' |
Nested temp/vhs-demo/temp/vhs-demo/ | Absolute path in Output | Use relative; cd in Hide |
| Typing takes forever | Default TypingSpeed 50ms | Set 12–20ms |
| Colors stripped | VHS pty, no FORCE_COLOR | export FORCE_COLOR=1 |
| Font renders wrong | JetBrains Mono not on macOS | Use Menlo |
| FontSize >18 wraps prompts | 960 width | Stick with 16 |
| BG output corrupts next command | & with stdout live | cmd >/dev/null 2>&1 & |
↑ / multibyte char eats adjacent chars | VHS Type + Unicode edge case | Drop the arrow; use plain ASCII |
| Choppy/dropped audio in long capture | ffmpeg avfoundation | Use sox -t coreaudio "BlackHole 2ch" |
| Video only 2s long | -shortest on ffmpeg | Drop -shortest |
| No sound in final mp4 | Missing audio map | -map 0:v -map "[a]" |
| BlackHole not found by ffmpeg | Wrong device index | Use sox + device name instead |
| Disk fills during capture | Uncompressed wav, forgot to trap | trap "kill $FFPID" EXIT |
| Final MP4 narration lands too late |
Default: sage. Others: fin, callum, onyx, rachel. Preview:
polli gen audio --voice <name> --output sample.mp3 "test line" && afplay sample.mp3
Full list: polli models --type audio --json | jq '.[].voices'.
Deterministic cache — same prompt+duration = same bytes. Always --instrumental. Match duration to ffprobe demo-silent.mp4. Example prompt: "lofi instrumental hip hop, warm analog, jazzy bass".
polli --help — orients without narration.--no-stream when used in a recorded tape so the audience sees generation happening live).See social/prompts/tone/linkedin.md, social/prompts/brand/about.md.
Model selection (ranked for this task):
claude-fast — pick this for live demos. Consistent ~4s latency, never leaks scaffolding in this prompt shape. Voice slightly flatter than glm but reliability wins when a Sleep 30000ms hold has to cover the call.glm — best voice quality when it works. But latency is wildly variable (12s first call → 31s next → sometimes >60s); not safe behind a fixed Sleep budget. Use for offline one-shots, not live-tape renders.kimi — strong rhythm, hacker-genz feel. Solid backup.openai — leaks prompt scaffolding verbatim (e.g. sign-off:, 110w, your own line-1 hook), overshoots word count. Avoid.gemini-search / perplexity-fast — web-search adds citation markers [1][2]; hallucinate features, inflate corpo voice. Avoid for this.Call flags:
--no-stream. Streaming + > file hangs on long prompts; --no-stream returns in 3–5s.timeout N — it breaks the stdout flush and always kills the call.Input context (what to pipe in):
polli --help only — leanest, forces the model to describe the command surface without repackaging README bullet points.--budget etc).--help limitation: doesn't expose npm package name → model guesses polli-cli. Patch install line manually.Prompt shape (compressed, declarative, no imperative wordiness):
announce polli on LinkedIn), not a procedure.· separators. Let the model interpret rather than listing banned phrases.7-verb CLI) — every model echoes them. Just list the verbs.no md · no 🚀 · no corpo as the only negative constraints. More bans → defensive, boring output.sign-off not CTA.Final prompt used (LinkedIn launch post):
↑ polli --help for polli CLI.
| ffmpeg startup + API pre-roll |
In mux: ffmpeg -ss 5 -i captured.wav … (tune 3–6s) |
| Music cuts abruptly at end | No fade filter | Add -af "afade=t=out:st=$((VID_DUR-3)):d=3" |
afade filter errors No option name near '3' on st=66,36 | Locale uses comma for decimals | Wrap the awk computing FADE_START with LC_ALL=C |
| Punctuation/wording cached across renders | Pollinations text cache keys on exact string | Change a character (swap · ↔ ; ↔ ,, add !) each re-render |