Use when the user wants to repurpose a YouTube video for Bilibili, add bilingual (English-Chinese) subtitles to a video, or create hardcoded subtitle versions for Chinese platforms.
Six-step pipeline: download → transcribe → translate → merge → burn subtitles → generate publish info. Produces a video with hardcoded bilingual (EN/ZH) subtitles and a publish_info.md with Bilibili upload metadata.
| Step | Tool | Command | Output |
|---|---|---|---|
| 0. Update | git | Auto-check for skill updates | — |
| 1. Download |
yt-dlp |
yt-dlp --cookies-from-browser chrome -f ... -o ... |
{slug}.mp4 |
| 2. Transcribe | whisper* | srt_utils.py check-whisper then transcribe | {slug}_{lang}.srt |
| 2.5 Validate | srt_utils.py | srt_utils.py validate / fix | {slug}_{lang}.srt (fixed) |
| 3. Translate | AI | SRT-aware batch translation | {slug}_zh.srt |
| 4. Merge | srt_utils.py | srt_utils.py merge ... | {slug}_bilingual.srt |
| 4.5 Style | srt_utils.py | srt_utils.py to_ass --preset netflix|clean|glow | {slug}_bilingual.ass |
| 5. Burn | ffmpeg | ffmpeg -c:v libx264 -vf ass=... | {slug}_bilingual.mp4 |
| 6. Publish | AI | Analyze content, generate metadata | publish_info.md |
Run this BEFORE any pipeline step. Locates the skill directory and checks for updates. The SKILL_DIR variable is reused by later steps for script paths.
# Find skill directory (works across Claude Code, OpenClaw, Hermes, Pi)
SKILL_DIR="$(find ~/.claude/skills ~/.openclaw/skills ~/.hermes/skills ~/.pi/agent/skills ~/.agents/skills ~/myagents/myskills -maxdepth 2 -name 'yt2bb' -type d 2>/dev/null | head -1)"
echo "yt2bb: SKILL_DIR=$SKILL_DIR"
if [ -n "$SKILL_DIR" ] && [ -d "$SKILL_DIR/.git" ]; then
git -C "$SKILL_DIR" fetch --quiet origin main 2>/dev/null
LOCAL=$(git -C "$SKILL_DIR" rev-parse HEAD)
REMOTE=$(git -C "$SKILL_DIR" rev-parse origin/main 2>/dev/null)
if [ "$LOCAL" != "$REMOTE" ]; then
echo "yt2bb: new version available. Run: git -C $SKILL_DIR pull origin main"
else
echo "yt2bb: up to date."
fi
fi
Note: Does not auto-pull — the current session already loaded the old SKILL.md. Notify the user and let them update between sessions.
Single video:
slug="video-name" # or: slug=$(python3 "$SKILL_DIR/srt_utils.py" slugify "Video Title")
mkdir -p "${slug}"
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "${slug}/${slug}.mp4" "https://www.youtube.com/watch?v=VIDEO_ID"
Playlist / series:
yt-dlp --cookies-from-browser chrome \
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]" \
-o "%(playlist_index)03d-%(title)s/%(playlist_index)03d-%(title)s.mp4" \
"https://www.youtube.com/playlist?list=PLAYLIST_ID"
After downloading, rename each folder to a clean slug and run Steps 2–6 for each video sequentially.
-f "bv*[ext=mp4]+ba[ext=m4a]/b[ext=mp4]": ensure mp4 output, avoid webm%(playlist_index)03d: zero-padded index to preserve playlist order--cookies-from-browser fails, export cookies first — see TroubleshootingFirst run the environment check to detect your platform and get a tailored whisper command:
python3 "$SKILL_DIR/srt_utils.py" check-whisper
This auto-detects OS, GPU (CUDA/Metal/CPU), memory, and installed backends, then recommends the best backend + model for your hardware. If memory detection is unavailable, it falls back conservatively instead of assuming a low-memory machine. Use the command it prints.
Manual fallback (openai-whisper, works everywhere):
src_lang="en" # Change to ja/ko/es/etc. based on source video
whisper_model="medium" # check-whisper recommends the best model for your hardware
whisper "${slug}/${slug}.mp4" \
--model "$whisper_model" \
--language "$src_lang" \
--word_timestamps True \
--condition_on_previous_text False \
--output_format srt \
--max_line_width 40 --max_line_count 1 \
--output_dir "${slug}"
mv "${slug}/${slug}.srt" "${slug}/${slug}_${src_lang}.srt"
Supported backends:
| Backend | Best for | Install |
|---|---|---|
mlx-whisper | macOS Apple Silicon (fastest) | pip install mlx-whisper |
whisper-ctranslate2 | Windows/Linux CUDA, or CPU (~4x faster) | pip install whisper-ctranslate2 |
openai-whisper | Universal fallback | pip install openai-whisper |
Model selection (auto-recommended by check-whisper):
tiny — fast draft, low accuracy, CPU-friendly (~1 GB)medium — default, good balance (~5 GB)large-v3 — best accuracy, recommended for JA/KO/ZH source (~10 GB)Notes:
--language: explicitly set to avoid misdetection; supports en, ja, ko, es, etc.--word_timestamps True: more precise subtitle timing--condition_on_previous_text False: prevent hallucination loopspython3 "$SKILL_DIR/srt_utils.py" validate "${slug}/${slug}_${src_lang}.srt"
# If issues found:
python3 "$SKILL_DIR/srt_utils.py" fix "${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_${src_lang}.srt"
Read {slug}_{src_lang}.srt and translate to Chinese. Critical rules:
These rules are modeled on the Netflix Simplified Chinese Timed Text Style Guide; follow them to produce broadcast-grade subtitles.
--> lines) exactly as-is。, !, ?; keep mid-sentence ,, 、, ; only when they add clarity,。!?、;:); use 「」 for inner quotes, not "" or ''GPT-4, 30fps, 2026); only punctuation is full-width的, 了, 吗, 呢, 吧, 啊); never split an English phrasal unit across a line break; keep modifiers with their heads{slug}/{slug}_zh.srtpython3 "$SKILL_DIR/srt_utils.py" merge \
"${slug}/${slug}_${src_lang}.srt" "${slug}/${slug}_zh.srt" "${slug}/${slug}_bilingual.srt"
Run lint on the merged bilingual SRT to catch Netflix Timed Text Style Guide violations that validate doesn't cover — reading speed (CPS), per-line length, inter-cue gaps, and line count.
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt"
Defaults (all overridable via flags):
| Rule | Threshold | Flag |
|---|---|---|
| Reading speed (English) | ≤ 17 CPS | --max-cps-en |
| Reading speed (Simplified Chinese) | ≤ 9 CPS | --max-cps-zh |
| Min cue duration | 833 ms (5/6 s) | --min-duration-ms |
| Max cue duration | 7000 ms | --max-duration-ms |
| Min inter-cue gap | 83 ms (2 frames @ 24 fps) | --min-gap-ms |
| Max chars/line (English) | 42 | --max-chars-en |
| Max chars/line (Chinese, full-width) | 16 | --max-chars-zh |
| Max lines per cue | 2 | — |
Severity model:
When CPS errors fire, the fix is almost always upstream — go back to Step 3 and rewrite the offending Chinese entry to fit the time window. Do not solve CPS by extending the cue past the source's spoken duration.
Agent-friendly output:
python3 "$SKILL_DIR/srt_utils.py" lint "${slug}/${slug}_bilingual.srt" --format json
Returns {ok, error_count, warning_count, issues: [{index, code, severity, message}, ...]} for programmatic filtering.
Convert the bilingual SRT to an ASS file. ASS enables per-line color, font size, and glow effects that are impossible with SRT force_style. Layout rule: subtitles always stay at the bottom. Default stack: ZH on the upper line of the bottom stack, EN on the lower line. The presets are tuned to keep the block readable while reducing overlap risk with lower-screen content.
IMPORTANT — Ask before proceeding. Present the preset table below to the user and ask which style they prefer. Do NOT silently pick a default. If the user has no preference, use
clean.
Available presets:
| Preset | Look | Best for |
|---|---|---|
netflix | Pure white text, thin black outline, soft drop shadow, no box — modeled on the Netflix Timed Text Style Guide | Professional, broadcast-grade look. Best default for documentaries, interviews, long-form content, and anything that should feel "streaming-platform native". Use with --font "Source Han Sans SC" on Linux / "PingFang SC" on macOS for closest Netflix Sans feel |
clean | Yellow text on gray box — golden ZH + light yellow EN, semi-transparent light gray background | Readability safety net for busy or mixed-brightness footage where netflix's outline-only text could get visually lost. The gray box guarantees a readable contrast pad |
glow | Yellow ZH + white EN with colored glow — bright yellow ZH + white EN, blurred outer glow, no background box | Entertainment, vlogs, energetic edits. Most eye-catching, but weakest on bright or busy backgrounds |
Example prompt to user:
字幕有三套样式可选:
netflix— 纯白字体 + 细黑描边 + 柔和阴影(默认推荐,Netflix 专业观感,适合纪录片/访谈/长内容)clean— 黄色字体 + 灰色半透明底框(亮背景或花背景的兜底选项,底框保证对比度)glow— 黄色/白色字体 + 彩色外发光(更抢眼,适合娱乐/Vlog)- 自定义 — 提供
.ass样式文件,完全控制字体、颜色、大小(可用 Aegisub 可视化编辑)选哪个?默认推荐
netflix;如果画面特别花哨或底部信息多,可改用clean。
# Netflix-grade default (white + outline + soft shadow), ZH on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset netflix
# Gray-box fallback for busy backgrounds, EN on top
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset clean --top en
# Vibrant glow (B站 entertainment style)
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--preset glow
Custom style file — for full control, provide an external .ass file with your own [V4+ Styles] section. It must contain styles named EN and ZH, or to_ass will fail early with a validation error. You can design styles visually with Aegisub and export.
python3 "$SKILL_DIR/srt_utils.py" to_ass \
"${slug}/${slug}_bilingual.srt" "${slug}/${slug}_bilingual.ass" \
--style-file my_styles.ass
Optionally add ; en_tag= and ; zh_tag={\blur5} comment lines in the .ass file to inject ASS override tags per language.
Font by platform (pass with --font, ignored when using --style-file):
| Platform | Flag |
|---|---|
| macOS | --font "PingFang SC" (default) |
| Linux | --font "Noto Sans CJK SC" |
| Windows | --font "Microsoft YaHei" |
Other options:
--top zh|en — which language on the upper line of the bottom stack (default: zh)--res WxH — video resolution (default: 1920x1080)Readability notes for all presets:
--res so 720p and 1080p keep similar visual balanceclean is the safest choice when you must keep subtitles at the bottom in every shotUse the ass= filter (not subtitles=) — all styling comes from the ASS file.
ffmpeg -i "${slug}/${slug}.mp4" \
-vf "ass='${slug}/${slug}_bilingual.ass'" \
-c:v libx264 -crf 23 -preset medium \
-c:a copy "${slug}/${slug}_bilingual.mp4"
-c:v libx264 -crf 23: good quality with reasonable file size-preset medium: balance between speed and compression (use fast for quicker encode)force_style needed — styles are embedded in the ASS fileBased on the video content (from {slug}_{src_lang}.srt and {slug}_zh.srt), generate {slug}/publish_info.md.
All output in this file must be in Chinese (targeting Bilibili audience).
# Publish Info
## Source
{YouTube URL}
## Titles (5 variants)
1. {Suspense/question style — spark curiosity}
2. {Data/achievement driven — emphasize results}
3. {Controversial/opinion style — spark discussion}
4. {Tutorial/practical style — emphasize utility}
5. {Emotional/relatable style — connect with audience}
## Tags
{~10 comma-separated keywords covering topic, technology, domain}
## Description
{3-5 sentences summarizing core content and highlights}
## Chapter Timestamps