Download a YouTube video (or use a local file), replace the original audio with a Chinese TTS WAV track, and produce 3 output files: original video, ZH dubbed without subtitles, ZH dubbed with hard-burned subtitles. Use this skill when the user wants to create a Chinese-dubbed video, or asks for "视频配音", "中文配音", "替换音轨", "烧入字幕" requests. Also triggers as the final step of the YouTube dubbing pipeline after tts-gen produces a ZH WAV.
Given a YouTube URL (or local video), a ZH TTS WAV, and a ZH SRT, produce:
| File | Contents |
|---|---|
<prefix>_original.mp4 | Best quality, original audio (reference copy) |
<prefix>_zh_nosub.mp4 | Best quality, ZH TTS audio, no subtitles |
<prefix>_zh.mp4 | Best quality, ZH TTS audio, hard-burned ZH subtitles |
For YouTube sources, the default prefix should be a filesystem-safe title_slug__video_id, not just the raw video ID. That keeps outputs readable without losing a stable identity.
pip install yt-dlp Pillow
brew install ffmpeg # macOS (no libass needed — uses Pillow overlay)
tts-gensubtitle-translateyt-dlp --version 2>/dev/null || pip install yt-dlp
python3 -c "import PIL" 2>/dev/null || pip3 install Pillow --break-system-packages
ffmpeg -version 2>/dev/null | head -1
Write the following to /tmp/video_dub.py:
#!/usr/bin/env python3
"""
video_dub.py — Full Chinese dubbing pipeline. Produces 3 output files:
<prefix>_original.mp4 — original video, best quality, original audio
<prefix>_zh_nosub.mp4 — best quality video + ZH TTS audio, no subtitles
<prefix>_zh.mp4 — best quality video + ZH TTS audio + hard-burned ZH subtitles
Subtitle burning: Pillow renders each entry as transparent RGBA PNG,
ffmpeg overlay chain composites them frame-accurately (no libass needed).
"""
import subprocess, sys, os, argparse, re, tempfile, json, unicodedata
from pathlib import Path