Use when user asks YouTube video extraction, get, fetch, transcripts, subtitles, or captions. Writes video details and transcription into structured markdown file.
Test variant: Keeps all polish steps (4, 7, 8). Combines summary (5+6) and comments (10a+10b) into single subagents.
Execute all steps sequentially without asking for user approval.
python3 ./check_existing.py "<YOUTUBE_URL>" "<output_directory>"
If returns exists: true: Skip to Step 10 (comments).
python3 extract_data.py "<YOUTUBE_URL>" "<output_directory>"
If video language is , proceed directly. If non-English, ask user which language.
enpython3 extract_transcript.py "<YOUTUBE_URL>" "<output_directory>" "<LANG_CODE>"
python3 extract_transcript_whisper.py "<YOUTUBE_URL>" "<output_directory>"
python3 ./deduplicate_vtt.py "<output_directory>/${BASE_NAME}_transcript.vtt" "<output_directory>/${BASE_NAME}_transcript_dedup.md" "<output_directory>/${BASE_NAME}_transcript_no_timestamps.txt"
task_tool:
INPUT: <output_directory>/${BASE_NAME}_transcript_no_timestamps.txt
CHAPTERS: <output_directory>/${BASE_NAME}_chapters.json
OUTPUT: <output_directory>/${BASE_NAME}_transcript_paragraphs.txt
Analyze INPUT and identify natural paragraph break line numbers.
Read CHAPTERS. If contains chapters, use as primary break points.
Target ~500 chars per paragraph.
Write to OUTPUT: 15,42,78,103,...
python3 ./apply_paragraph_breaks.py "<output_directory>/${BASE_NAME}_transcript_dedup.md" "<output_directory>/${BASE_NAME}_transcript_paragraphs.txt" "<output_directory>/${BASE_NAME}_transcript_paragraphs.md"
task_tool:
INPUT: <output_directory>/${BASE_NAME}_transcript_no_timestamps.txt
OUTPUT: <output_directory>/${BASE_NAME}_summary_tight.md
FORMATS: ./summary_formats.md
1. Classify content type:
- TIPS: gear reviews, rankings, practical advice
- INTERVIEW: podcasts, conversations, Q&A
- EDUCATIONAL: concept explanations, analysis
- TUTORIAL: step-by-step instructions
2. Read FORMATS and create summary using format for detected type.
3. Self-review: Cut fluff, enforce <10% of transcript bytes, prefer lists over prose.
4. Save final tightened summary to OUTPUT.
Rules:
- Skip ads, sponsors, self-promotion
- Preserve original language
ACTION REQUIRED: Use Write tool NOW to save to OUTPUT file.
task_tool:
Read <output_directory>/${BASE_NAME}_transcript_paragraphs.md and clean speech artifacts.
Tasks:
- Remove fillers (um, uh, like, you know)
- Fix transcription errors
- Add proper punctuation
- Keep timestamps at end of paragraphs
ACTION REQUIRED: Write to <output_directory>/${BASE_NAME}_transcript_cleaned.md
task_tool:
INPUT: <output_directory>/${BASE_NAME}_transcript_cleaned.md
OUTPUT: <output_directory>/${BASE_NAME}_transcript.md
CHAPTERS: <output_directory>/${BASE_NAME}_chapters.json
Read INPUT. Add markdown headings.
If CHAPTERS has content: Use chapter names as ### headings
If empty: Add ### headings where major topics change
ACTION REQUIRED: Write to OUTPUT file.
python3 finalize.py "${BASE_NAME}" "<output_directory>"
Use --debug flag to keep intermediate work files.
python3 ./extract_comments.py "<YOUTUBE_URL>" "<output_directory>"
python3 ./prefilter_comments.py "<output_directory>/${BASE_NAME}_comments.md" "<output_directory>/${BASE_NAME}_comments_prefiltered.md"
task_tool:
SUMMARY_FILE: Find the main summary file "youtube - * (${VIDEO_ID}).md" in <output_directory>
COMMENTS: <output_directory>/${BASE_NAME}_comments_prefiltered.md
Read SUMMARY_FILE to understand video content and type.
Extract insights from COMMENTS that ADD VALUE beyond summary:
- Corrections/Extensions
- Alternative applications
- Debates/controversies
Format as:
## Comment Insights
**Key Takeaway**: [one sentence]
[Categories based on video type with @username attributions]
Rules:
- Only insights NOT already in summary
- Be ruthlessly concise
ACTION REQUIRED: Read the summary file, append Comment Insights section, and write back using Write tool.