Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
This skill follows the Agent Skills specification.
Turn any script into a YouTube-ready voiceover — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.
Built for YouTube creators who want studio-quality narration without the studio. Powered by Voice.ai.
| Scenario | Why it fits |
|---|---|
| YouTube long-form | Full narration with chapter markers and captions |
| YouTube Shorts |
| Quick hooks with punchy delivery |
| Course content | Professional narration for educational videos |
| Screen recordings | Dub a screencast with clean AI voiceover |
| Quick iteration | Smart caching — edit one section, only that segment re-renders |
| Batch production | Same voice, consistent quality across every video |
Have a script and a video? Dub it in one shot:
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My YouTube Video" \
--video ./my-recording.mp4 \
--mux \
--template youtube
This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:
out/my-youtube-video/muxed.mp4 — your video dubbed with the AI voiceoverout/my-youtube-video/master.wav — the standalone audioout/my-youtube-video/review.html — listen and review each segmentout/my-youtube-video/chapters.txt — paste directly into your YouTube descriptionout/my-youtube-video/captions.srt — upload to YouTube as subtitlesout/my-youtube-video/description.txt — ready-made YouTube description with chaptersUse --sync pad if the audio is shorter than the video, or --sync trim to cut it to match.
.env file in the skill root. Get a key at voice.ai/dashboard.Set VOICE_AI_API_KEY as an environment variable before running:
export VOICE_AI_API_KEY=your-key-here
The skill does not read .env files or access any files for credentials — only the environment variable.
Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).
build — Generate a YouTube voiceover from a scriptnode voiceai-vo.cjs build \
--input <script.md or script.txt> \
--voice <voice-alias-or-uuid> \
--title "My YouTube Video" \
[--template youtube] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]
What it does:
## headings for .md, or by sentence boundaries for .txt)Full options:
| Option | Description |
|---|---|
-i, --input <path> | Script file (.txt or .md) — required |
-v, --voice <id> | Voice alias or UUID — required |
-t, --title <title> | Video title (defaults to filename) |
--template youtube | Auto-inject YouTube intro/outro |
--mode <mode> | headings or auto (default: headings for .md) |
--max-chars <n> | Max characters per auto-chunk (default: 1500) |
--language <code> | Language code (default: en) |
--video <path> | Input video to dub |
--mux | Enable video dubbing (requires --video) |
--sync <policy> | shortest, pad, or trim (default: shortest) |
--force | Re-render all segments (ignore cache) |
--mock | Mock mode — no API calls, placeholder audio |
-o, --out <dir> | Custom output directory |
replace-audio — Dub an existing videonode voiceai-vo.cjs replace-audio \
--video ./my-video.mp4 \
--audio ./out/my-video/master.wav \
[--out ./out/my-video/dubbed.mp4] \
[--sync shortest|pad|trim]
Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.
| Sync policy | Behavior |
|---|---|
shortest (default) | Output ends when the shorter track ends |
pad | Pad audio with silence to match video duration |
trim | Trim audio to match video duration |
Video stream is copied without re-encoding (-c:v copy). Audio is encoded as AAC for YouTube compatibility.
Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.
voices — List available voicesnode voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
Use short aliases or full UUIDs with --voice:
| Alias | Voice | Gender | Best for YouTube |
|---|---|---|---|
ellie | Ellie | F | Vlogs, lifestyle, social content |
oliver | Oliver | M | Tutorials, narration, explainers |
lilith | Lilith | F | ASMR, calm walkthroughs |
smooth | Smooth Calm Voice | M | Documentaries, long-form essays |
corpse | Corpse Husband | M | Gaming, entertainment |
skadi | Skadi | F | Anime, character content |
zhongli | Zhongli | M | Gaming, dramatic intros |
flora | Flora | F | Kids content, upbeat videos |
chief | Master Chief | M | Gaming, action trailers |
The voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.
After a build, the output directory contains everything you need to publish on YouTube:
out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched voiceover (requires ffmpeg)
master.mp3 # MP3 for upload (requires ffmpeg)
muxed.mp4 # Dubbed video (if --video --mux used)
chapters.txt # Paste into YouTube description
captions.srt # Upload as YouTube subtitles
description.txt # Ready-made YouTube description with chapters
review.html # Interactive review page with audio players
manifest.json # Build metadata: voice, template, segment list
timeline.json # Segment durations and start times
muxed.mp4 (or your original video + master.mp3 as audio)chapters.txt content into your YouTube descriptioncaptions.srt as subtitles in YouTube StudioUse --template youtube to auto-inject a branded intro and outro:
| Segment | Source file |
|---|---|
| Intro (prepended) | templates/youtube_intro.txt |
| Outro (appended) | templates/youtube_outro.txt |
Edit the files in templates/ to customize your channel's branding.
Segments are cached by a hash of: text content + voice ID + language.
--force to re-render everythingsegments/.cache.jsonVoice.ai supports 11 languages — dub your YouTube videos for global audiences:
en, es, fr, de, it, pt, pl, ru, nl, sv, ca
node voiceai-vo.cjs build \
--input script-spanish.md \
--voice ellie \
--title "Mi Video" \
--language es \
--video ./my-video.mp4 \
--mux
The pipeline auto-selects the multilingual TTS model for non-English languages.
| Issue | Solution |
|---|---|
| ffmpeg missing | Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing. |
| Rate limits (429) | Segments render sequentially, which stays under most limits. Wait and retry. |
| Insufficient credits (402) | Top up at voice.ai/dashboard. Cached segments won't re-use credits on retry. |
| Long scripts | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
| Windows paths | Wrap paths with spaces in quotes: --input "C:\My Scripts\script.md" |
See references/TROUBLESHOOTING.md for more.
references/VOICEAI_API.md — API endpoints, audio formats, modelsreferences/TROUBLESHOOTING.md — Common issues and fixes