Download transcripts for all data folders sequentially. Use for overnight batch processing or when you need to download pending transcripts across all channels and collections.
Why? Manually downloading transcripts folder-by-folder is tedious and error-prone. This skill automates overnight batch processing across all channels and collections with built-in rate limiting and resumability.
# Run from repository root - handles everything automatically
./scripts/download_all_transcripts.sh
That's it. The script finds all folders with videos.csv, downloads pending transcripts, and resumes safely if interrupted.
Before running, ensure:
data/ folder contains at least one subfolder with a videos.csv filetranscript-download CLI is installed (comes with the project's Python package)# Check for valid data folders
ls data/*/videos.csv
[!TIP] If no
videos.csvfiles exist, first runextract-videosorsync-all-channelsto populate them.
./scripts/download_all_transcripts.sh
The script will:
data/ containing videos.csv<folder>/transcripts/[!CAUTION] This is a long-running operation. For a channel with 500 videos, expect 8+ hours. Run overnight or in a
tmux/screensession.
The script outputs real-time progress:
📝 YTScribe - Download All Transcripts
=======================================
Started at: Thu Dec 26 09:00:00 PST 2024
Delay between videos: 60s
Found 12 folders with videos.csv
────────────────────────────────────────
[1/12] Processing: lex-fridman
CSV: /path/to/data/lex-fridman/videos.csv
Output: /path/to/data/lex-fridman/transcripts
On successful completion:
✅ All transcripts downloaded!
Finished at: Thu Dec 26 17:30:00 PST 2024
Summary of folders processed:
- lex-fridman: 342 transcripts
- huberman-lab: 156 transcripts
...
On interruption or IP block:
Simply run the script again. It automatically skips videos where transcript_downloaded=True in the CSV.
Transcripts are saved as markdown with YAML frontmatter:
data/huberman-lab/
├── videos.csv
└── transcripts/
├── 2024-01-15-abc123.md
├── 2024-01-20-def456.md
└── ...
Each transcript file contains:
---
video_id: abc123