Turn any long YouTube interview, talk, or podcast into high-retention Chinese short clips for Shorts/Reels/Douyin/TikTok. Triggers: YouTube URL + short clips / 切片 / 短视频 / 中文字幕 / hard subtitles. Pipeline: source fetch -> subtitle parsing -> candidate analysis -> user review -> clip export -> subtitle burn.
Convert one long YouTube video into 5–15 short clips with Chinese packaging and hard-burned subtitles.
This document is the single source of truth for any AI agent executing the AutoCliper pipeline. Follow every numbered step in order. Do not skip steps. Do not guess file names — inspect the filesystem after each write operation.
Before doing anything else, verify the environment. Report the exact blocker and stop if any check fails.
| Check | Command | Pass Condition |
|---|---|---|
| yt-dlp installed | which yt-dlp | returns a path |
| ffmpeg installed | which ffmpeg | returns a path; if missing, try python3 -c "import imageio_ffmpeg; print(imageio_ffmpeg.get_ffmpeg_exe())" |
| Python 3.9+ | python3 --version |
| >= 3.9 |
| Skill repo available | check that corekit/ directory exists relative to the skill root | corekit/__init__.py is present |
Set the environment for all subsequent commands:
export PYTHONPATH="<path-to-AutoCliper.AI-repo>"
All python3 -m corekit.* commands below assume this PYTHONPATH is set.
Create the output directory structure for this video. Derive <slug> from the video title or ID — lowercase, hyphens, no spaces, no special characters.
mkdir -p "studio/<slug>/intake"
mkdir -p "studio/<slug>/intel"
mkdir -p "studio/<slug>/exports"
Expected result:
studio/<slug>/
├── intake/ ← raw video + subtitles land here
├── intel/ ← analysis artifacts land here
└── exports/ ← per-clip folders land here
Download the video and subtitles from YouTube:
python3 -m corekit.fetch_source "<YouTube_URL>" "studio/<slug>/intake"
What happens inside: the downloader tries English subtitles first (en, en-US, en-orig), falls back to Chinese (zh-Hans, zh-CN, zh). It uses --cookies-from-browser chrome for authenticated access.
After the command finishes, list the intake folder and identify:
.mp4 video file (may contain the video title and ID in the filename).srt subtitle file (may have a language tag like .en.srt or .zh-Hans.srt).ytdl, .jpg, etc.) — note them but they are not neededIf the download fails:
Record the exact filenames for the next steps. Do not guess or hardcode names.
Convert the SRT subtitle file into structured JSON for easier analysis:
python3 -m corekit.subtitle_to_json \
"studio/<slug>/intake/<exact-srt-filename>" \
"studio/<slug>/intel/transcript.json"
Verify the output file exists and contains an array of cue objects. Each cue has:
{
"index": 1,
"start": "00:00:01,234",
"end": "00:00:03,456",
"start_seconds": 1.234,
"end_seconds": 3.456,
"text": "the spoken content"
}
If the transcript is empty or has fewer than 10 cues, report "transcript too short or corrupted" and stop.
This is the most important intellectual step. Read the following files before starting analysis:
selected_clips.jsonDo NOT try to pick clips in a single pass. Follow this sequence:
Pass 1 — Skim and flag: Read through transcript.json looking for stretches that contain memorable, information-dense, opinionated, counterintuitive, or emotionally sharp content. Flag generously — it is better to flag too many than too few.
Pass 2 — Boundary refinement: For each flagged stretch, re-read the surrounding cues (5–10 cues before and after). Choose a clean start and end:
Pass 3 — Completeness check: For each candidate, verify:
Pass 4 — Score and rank: Score each candidate on four dimensions (1–5):
hook — how compelling is the opening in the first 3 seconds?clarity — is it one clean idea with low ambiguity?standalone — can it stand alone without prior context?payoff — is the ending useful, memorable, or share-worthy?Rank by weighted total: hook × 0.35 + clarity × 0.25 + standalone × 0.2 + payoff × 0.2
| Source Duration | Target Candidates |
|---|---|
| < 20 min | 5 – 8 |
| 20 – 60 min | 8 – 12 |
| > 60 min | 10 – 15 |
The default failure mode should be "too few candidates," not "too many." When in doubt, include more. The user will prune.
YouTube auto-captions often contain:
When the source subtitles are auto-generated:
Write two files:
studio/<slug>/intel/selected_clips.json — the machine-readable clip decisions. Must follow the schema in playbooks/clip-contract.md exactly.
studio/<slug>/intel/candidate-board.md — the human-readable review table.
Show the candidate list in a review-friendly format. For each candidate, display:
| Field | Format | Example |
|---|---|---|
| ID | clip-XX | clip-01 |
| Time range | HH:MM:SS → HH:MM:SS | 00:12:03 → 00:13:22 |
| Duration | human-readable | 1m 19s |
| Title | one provocative Chinese title, ≤12 characters | AI终将取代一切? |
| Summary | exactly two sentences in Chinese | 句1: 在讲什么。句2: 为什么值得看。 |
Present as a numbered table so the user can reply with IDs.
If the user says "proceed" or "auto-pick": select the top-ranked candidates yourself. For a 1-hour video, export at least 8.
If the user picks specific IDs: export only those.
For each clip the user chose (or you auto-picked), execute steps 6a through 6f in order. Complete all sub-steps for one clip before moving to the next.
python3 -m corekit.cut_video \
"studio/<slug>/intake/<exact-mp4-filename>" \
<start_seconds> <end_seconds> \
"studio/<slug>/exports/<clip-folder>/clip.mp4"
The <clip-folder> naming convention is XX-<title-slug>, e.g., 01-ai-will-replace.
Verify the output file exists and has a non-zero size.
python3 -m corekit.window_subtitles \
"studio/<slug>/intake/<exact-srt-filename>" \
<start_seconds> <end_seconds> \
"studio/<slug>/exports/<clip-folder>/clip.src.srt"
This extracts only the cues that overlap the clip window and shifts all timestamps so the clip starts at 00:00:00,000.
Verify the output SRT has at least one cue.
Read clip.src.srt and translate it into simplified Chinese. Write the result to clip.zh.srt in the same folder.
Translation rules (follow strictly):
For each clip, create:
Title (for the first-second overlay):
Description (for platform distribution):
Write both to studio/<slug>/exports/<clip-folder>/metadata.txt:
标题:AI终将取代一切?
描述:Sam Altman 在 Lex Fridman 播客中谈到 AGI 的时间表——他认为大多数人严重低估了 AI 的发展速度,未来三年将改变一切。
python3 -m corekit.render_hardsubs \
"studio/<slug>/exports/<clip-folder>/clip.mp4" \
"studio/<slug>/exports/<clip-folder>/clip.zh.srt" \
"studio/<slug>/exports/<clip-folder>/clip.hardsub.mp4" \
--title "<the-title-from-6d>"
CRITICAL: Pass the Chinese clip.zh.srt file, NOT clip.src.srt. This is the most common agent mistake — passing the source-language subtitle instead of the translation.
The burn step will:
Verify the output clip.hardsub.mp4 exists and has a larger file size than clip.mp4 (the filter chain adds visual data).
After each clip is exported, append its title and description to the combined packaging file:
studio/<slug>/intel/packaging-copy.md
Format:
## clip-01: AI终将取代一切?
**标题**: AI终将取代一切?
**描述**: Sam Altman 在 Lex Fridman 播客中谈到 AGI 的时间表...
**文件**: studio/<slug>/exports/01-ai-will-replace/clip.hardsub.mp4
**时长**: 1m 19s
---
After all clips are exported, return to the user:
studio/<slug>/intake/studio/<slug>/intel/packaging-copy.mdclip.hardsub.mp4 pathstudio/<video-slug>/
├── intake/ # raw assets from YouTube
│ ├── <title> [<id>].mp4 # source video (filename from yt-dlp)
│ └── <title> [<id>].<lang>.srt # source subtitle
├── intel/ # analysis artifacts
│ ├── transcript.json # structured subtitle cues (from step 3)
│ ├── selected_clips.json # clip decisions (from step 4)
│ ├── candidate-board.md # review table (from step 4)
│ └── packaging-copy.md # all titles + descriptions (from step 6f)
└── exports/ # one folder per exported clip
└── 01-<slug>/
├── clip.mp4 # raw cut (no subtitles)
├── clip.src.srt # windowed source-language subtitle
├── clip.zh.srt # translated Chinese subtitle
├── clip.hardsub.mp4 # final deliverable (burned subtitles + title)
└── metadata.txt # title + description for this clip
| Module | Invocation | Arguments |
|---|---|---|
corekit.fetch_source | python3 -m corekit.fetch_source <url> <output_dir> | YouTube URL, output directory |
corekit.subtitle_to_json | python3 -m corekit.subtitle_to_json <input.srt> <output.json> | SRT file path, JSON output path |
corekit.cut_video | python3 -m corekit.cut_video <input.mp4> <start_sec> <end_sec> <output.mp4> | source video, start seconds (float), end seconds (float), output path |
corekit.window_subtitles | python3 -m corekit.window_subtitles <input.srt> <start_sec> <end_sec> <output.srt> | source SRT, start seconds (float), end seconds (float), output path |
corekit.render_hardsubs | python3 -m corekit.render_hardsubs <input.mp4> <input.srt> <output.mp4> --title "..." | clip video, Chinese SRT, output path, optional title text |
Optional flags for render_hardsubs:
--fontfile <path> — override the auto-detected Chinese font--subtitle-fontsize <int> — subtitle font size for drawtext fallback (default 28)Paths with spaces or non-ASCII: Always pass file paths as separate arguments to shell commands. Never build a command string with f"ffmpeg -i {path}" — the downloader produces filenames like My Interview [abc123].mp4. Use list-based subprocess.run(cmd).
Cookie failures: The downloader uses --cookies-from-browser chrome. If it fails with a 403 or login-required error, tell the user to open YouTube in Chrome and verify they are logged in, then retry. Do not modify the download command.
Verify after every write: After every command that produces a file, check that the file exists and has non-zero size before proceeding. If a step produces no output, report it immediately.
Vertical shorts: If the user asks for vertical (9:16) shorts, complete the entire horizontal pipeline first, then crop/reframe as a post-processing step. Do not attempt vertical crop during the main pipeline.
Encoder detection is automatic: Do not hardcode libx264 or aac in any manual ffmpeg commands. The corekit modules handle encoder selection. If you need to run ffmpeg directly for any reason, use corekit.ffmpeg_locator.h264_encoder() and corekit.ffmpeg_locator.aac_encoder() to get the right encoder names.
If the pipeline cannot complete, report the exact blocker using one of these categories:
| Blocker | What to tell the user |
|---|---|
COOKIES_EXPIRED | "YouTube download failed due to expired cookies. Please log in to YouTube in Chrome and retry." |
NO_SUBTITLES | "No English or Chinese subtitles found for this video. The pipeline requires subtitles." |
FFMPEG_MISSING | "ffmpeg is not installed. Install via brew install ffmpeg or pip install imageio-ffmpeg." |
DOWNLOAD_FAILED | "Video download failed. [include the error message from yt-dlp]" |
TRANSCRIPT_EMPTY | "The subtitle file is empty or contains too few cues to analyze." |
LOW_QUALITY_TRANSCRIPT | "The auto-generated subtitles are too noisy to produce reliable clips. Consider finding a manually transcribed version." |
Do not attempt workarounds for blockers. Report and stop.