Product Demo Video — End-to-End Skill

Build a professional product demo video combining narrated audio (TTS), motion graphics (Remotion), and terminal recordings (VHS). The final output is a single .mp4 with synced audio.

Architecture Overview

A demo video has three layers assembled in a pipeline:

1. Narration Audio   →  TTS CLI generates speech from scripts
2. Motion Graphics   →  Remotion renders animated intro/transitions
3. Terminal Demos    →  VHS records scripted terminal sessions
4. Assembly          →  ffmpeg concatenates video + merges audio

Directory structure:

demo/
├── build.sh                  # Master build script (orchestrates everything)
├── build_narration.sh        # Narration pipeline: TTS → scribe → cues
├── narration_script.md       # Narration plan & source file list
├── transcript.md             # Final transcript with timestamps & beat markers
├── narration/                # Per-beat narration scripts (one sentence each)
│   ├── manifest.json         # Beat manifest (id, sequence, role, beatIndex, script)
│   ├── 01_hook.txt           # Act 1 beats (story)
│   ├── 02_stars.txt
│   ├── ...
│   ├── 08_engine.txt         # Act 2 beats (tech)
│   ├── ...
│   ├── 12_voice_cloning.txt  # Act 3 beats (features)
│   ├── ...
│   ├── 18_demo_say.txt       # Act 4 beats (demo)
│   ├── ...
│   └── 23_closing.txt        # Act 5 beats (cta)
├── terminal_voices.tape      # VHS tape: install & setup
├── terminal_speech.tape      # VHS tape: voice cloning & speech
├── terminal_config.tape      # VHS tape: generate, export & workflow
├── ttscli_demo.tape          # VHS tape: full demo (alternative single-take)
├── intro/                    # Remotion project
│   ├── package.json
│   ├── remotion.config.ts
│   ├── tsconfig.json
│   ├── tailwind.config.js
│   ├── public/               # Static assets (audio, images)
│   │   └── ttscli_intro.wav
│   └── src/
│       ├── Root.tsx           # Remotion entry — registers compositions
│       ├── TtsIntro.tsx       # Main composition — scene sequencing
│       ├── design.ts          # Shared palette, fonts, shadows
│       ├── narrationCues.ts   # Auto-generated timing constants from scribe
│       ├── index.ts
│       ├── style.css
│       ├── scenes/            # One component per visual act
│       │   ├── OpenClawStory.tsx     # Act 1: AI Agent story (6 beats)
│       │   ├── HowItWorks.tsx        # Act 2: Engine, backends, install (4 beats)
│       │   ├── FeatureHighlights.tsx  # Act 3: 6 unique feature beats
│       │   ├── LiveDemo.tsx          # Act 4: Persistent terminal (3 beats)
│       │   └── CallToAction.tsx      # Act 5: GitHub CTA + logo lock (3 beats)
│       └── effects/           # Reusable visual effects
│           ├── Backdrop.tsx
│           ├── RhythmOverlay.tsx
│           ├── TerminalChrome.tsx   # Shared terminal window chrome
│           └── Waveform.tsx         # Animated waveform SVG
└── out/                       # Build artifacts (gitignored)
    ├── ttscli_demo.mp4
    ├── intro.mp4
    ├── terminal1.mp4
    ├── terminal2.mp4
    ├── terminal3.mp4
    ├── narration/             # Per-segment WAV files
    │   ├── 01_title.wav
    │   ├── 02_tech.wav
    │   └── ...
    ├── narration_transcripts/ # Scribe JSON outputs per segment
    │   ├── 01_title.json
    │   ├── 02_tech.json
    │   └── ...
    └── narration_timestamps.json  # Combined timeline with all beat markers

Architecture Overview

A demo video has three layers assembled in a pipeline:

1. Narration Audio → TTS CLI generates speech from scripts 2. Motion Graphics → Remotion renders animated intro/transitions 3. Terminal Demos → VHS records scripted terminal sessions 4. Assembly → ffmpeg concatenates video + merges audio

Directory structure:

demo/ ├── build.sh # Master build script (orchestrates everything) ├── build_narration.sh # Narration pipeline: TTS → scribe → cues ├── narration_script.md # Narration plan & source file list ├── transcript.md # Final transcript with timestamps & beat markers ├── narration/ # Per-beat narration scripts (one sentence each) │ ├── manifest.json # Beat manifest (id, sequence, role, beatIndex, script) │ ├── 01_hook.txt # Act 1 beats (story) │ ├── 02_stars.txt │ ├── ... │ ├── 08_engine.txt # Act 2 beats (tech) │ ├── ... │ ├── 12_voice_cloning.txt # Act 3 beats (features) │ ├── ... │ ├── 18_demo_say.txt # Act 4 beats (demo) │ ├── ... │ └── 23_closing.txt # Act 5 beats (cta) ├── terminal_voices.tape # VHS tape: install & setup ├── terminal_speech.tape # VHS tape: voice cloning & speech ├── terminal_config.tape # VHS tape: generate, export & workflow ├── ttscli_demo.tape # VHS tape: full demo (alternative single-take) ├── intro/ # Remotion project │ ├── package.json │ ├── remotion.config.ts │ ├── tsconfig.json │ ├── tailwind.config.js │ ├── public/ # Static assets (audio, images) │ │ └── ttscli_intro.wav │ └── src/ │ ├── Root.tsx # Remotion entry — registers compositions │ ├── TtsIntro.tsx # Main composition — scene sequencing │ ├── design.ts # Shared palette, fonts, shadows │ ├── narrationCues.ts # Auto-generated timing constants from scribe │ ├── index.ts │ ├── style.css │ ├── scenes/ # One component per visual act │ │ ├── OpenClawStory.tsx # Act 1: AI Agent story (6 beats) │ │ ├── HowItWorks.tsx # Act 2: Engine, backends, install (4 beats) │ │ ├── FeatureHighlights.tsx # Act 3: 6 unique feature beats │ │ ├── LiveDemo.tsx # Act 4: Persistent terminal (3 beats) │ │ └── CallToAction.tsx # Act 5: GitHub CTA + logo lock (3 beats) │ └── effects/ # Reusable visual effects │ ├── Backdrop.tsx │ ├── RhythmOverlay.tsx │ ├── TerminalChrome.tsx # Shared terminal window chrome │ └── Waveform.tsx # Animated waveform SVG └── out/ # Build artifacts (gitignored) ├── ttscli_demo.mp4 ├── intro.mp4 ├── terminal1.mp4 ├── terminal2.mp4 ├── terminal3.mp4 ├── narration/ # Per-segment WAV files │ ├── 01_title.wav │ ├── 02_tech.wav │ └── ... ├── narration_transcripts/ # Scribe JSON outputs per segment │ ├── 01_title.json │ ├── 02_tech.json │ └── ... └── narration_timestamps.json # Combined timeline with all beat markers

Act	Purpose	Beats	Duration
Story / Hook	Grab attention, establish the problem	5–7	12–18s
How It Works	Engine, backends, install	3–5	10–14s
Feature Highlights	One unique visual per feature	4–6	14–20s
Live Demo	Terminal with accumulating commands	3	8–12s
CTA	GitHub link + logo lock	2–3	6–10s

Flag	Description
`-f, --format <type>`	Output format: `json`, `md`, `txt`, `srt`, `all` (default: `json`)
`-o, --output-dir <dir>`	Output directory (default: `.`)
`-d, --diarize`	Enable speaker diarization
`-s, --speakers <count>`	Speaker count hint (1–32)
`-l, --language <code>`	Language code (ISO-639, e.g. `en`, `zh`)
`--stdout`	Print to stdout instead of writing file
`-q, --quiet`	Suppress progress output

	scribe	ffprobe fallback
Duration accuracy	From speech model — accounts for silence trimming	File-level — includes trailing silence
Verified transcript	Catches TTS errors (mispronunciations, skipped words)	Uses source script (assumes TTS was perfect)
Word-level timing	Available — enables per-word animation sync	Not available
Offline use	❌ Requires ElevenLabs API	✅ Fully offline
Speed	~2-5s per segment (API call)	Instant

Abstract (avoid)	Concrete (prefer)
Neural network dots	Agent thinking steps: 🔍 `read codebase...` → 🧠 `analyzing...` → 📋 `plan: 3 steps`
Floating particles	Code snippet with syntax highlighting
Generic waveform	Terminal pipeline: `$ running...` → `✓ git done` → `✓ test done`

Setting	Recommended Value	Why
`Width` / `Height`	1920 × 1080	Match Remotion resolution
`Framerate`	30	Match Remotion fps
`Theme`	Github (light) or Dracula (dark)	Consistent look
`TypingSpeed`	30ms	Fast enough to not bore, slow enough to read
`Sleep` after Enter	2000–4000ms	Let output render before next command

Tool	Install	Purpose
`tts`	`pip install tts-cli`	Narration audio generation
`node` / `npx`	`brew install node`	Remotion rendering
`remotion`	`npx create-video@latest`	Motion graphics
`vhs`	`brew install charmbracelet/tap/vhs`	Terminal recording
`ffmpeg`	`brew install ffmpeg`	Video/audio processing
`scribe`	`npm install -g scribe-cli` + `scribe auth`	Transcription for accurate timestamps (ElevenLabs API)

Product Demo

Product Demo Video — End-to-End Skill

Architecture Overview

Product Demo

Product Demo Video — End-to-End Skill

Architecture Overview

Step 1: Write the Narration Script

Guidelines

Beat manifest (narration/manifest.json)

Generate audio with TTS CLI

Concatenate segments with ffmpeg

Extract timestamps with Scribe

Setup

Transcribe individual segments

Scribe CLI options

JSON output structure

Batch transcription in the narration pipeline

From scribe output → beat markers → Remotion cues

Why scribe over ffprobe alone?

Single-segment regeneration (iterating on one beat)

1. Generate multiple candidates

2. Replace and get new duration

3. Update timestamps JSON (recompute all offsets)

4. Rebuild all downstream artifacts

Summary: single-segment patch checklist

Step 2: Build Motion Graphics with Remotion

Project setup

Design system (design.ts)

Scene components

Audio-synced timeline (TtsIntro.tsx)

Narration cues file (narrationCues.ts)

Render

Tips for motion graphics

Card layout guidelines

Themed scenes (e.g. GitHub-style)

Concrete vs abstract illustrations

Step 3: Record Terminal Demos with VHS

Install

Write a .tape file

Tape file guidelines

Record

Tips

Step 4: Assemble the Final Video

Timing plan

Trim terminals to exact duration

Concatenate segments

Merge narration audio

Run the full pipeline

Quick-Start Checklist

Required Tools

Reference Files

Oracle

Blucli

Peekaboo

Add Dock Band

Add Fallback Commands

Add Adaptive Card Form

Beat manifest (`narration/manifest.json`)

Design system (`design.ts`)

Audio-synced timeline (`TtsIntro.tsx`)

Narration cues file (`narrationCues.ts`)

Write a `.tape` file