Use when the user wants to transcribe a YouTube video, get a transcript from a YouTube URL, convert YouTube audio to text, or summarize a YouTube video by first getting its transcript.
Transcribe YouTube videos locally using faster-whisper (GPU-accelerated). Transcripts are saved as .md files named after the video title.
All scripts are in the scripts/ folder relative to this skill:
scripts/transcribe.py — full pipeline: download audio → transcribe with Whisperscripts/get_transcript.py — fast alternative using YouTube's built-in transcript API (no GPU)Python: use the pollen conda env (defined in CLAUDE.md)
Before running anything, ask the user these questions in a single message:
- Timestamps — every segment / every N seconds (how many?) / none?
.md (default, works with Obsidian) or .txt?large-v3-turbo (~3GB VRAM, recommended). Change? Options: tiny / base / small / medium / large-v3-turbo / large-v3Wait for the user's answers, then map them to script inputs and run:
printf "y\n1\n{ts_choice}\n{fmt_choice}\n{model_choice}\n" | $PYTHON scripts/transcribe.py "URL"
Prompt mapping:
y1 (local GPU)1 = every segment, 2 + follow-up for interval, 3 = none1 = .md, 2 = .txt1–6 (large-v3-turbo = 5)After transcription completes:
# Basic
python scripts/transcribe.py "https://youtube.com/watch?v=..."
# Plain text, no timestamps
python scripts/transcribe.py "URL" --no-timestamps
# Timestamp every 60 seconds
python scripts/transcribe.py "URL" --timestamp-interval 60
# Force language
python scripts/transcribe.py "URL" --language en
# Keep the downloaded audio
python scripts/transcribe.py "URL" --keep-audio
[1] Local GPU (default) · [2] Vast.ai[1] Every segment · [2] Every N seconds · [3] None[1] .md (default, Obsidian-friendly) · [2] .txt| # | Model | VRAM |
|---|---|---|
| 1 | tiny | ~0.5 GB |
| 2 | base | ~1 GB |
| 3 | small | ~2 GB |
| 4 | medium | ~4 GB |
| 5 | large-v3-turbo | ~3 GB ← default |
| 6 | large-v3 | ~6 GB (tight) |
Transcripts saved as <video title>.md in the skill root directory.
Model cache: ~/.cache/huggingface/hub/ (downloads once, reused after)
Try this first — much faster, no download needed:
python scripts/get_transcript.py "URL"
Falls back to transcribe.py if no transcript is available on YouTube.
After transcription, always read and fix obvious errors before reporting done:
Do NOT rephrase or change meaning — only fix clear transcription errors. Briefly tell the user what was corrected.