Use this skill whenever a local audio or video file needs transcription, especially when an upstream source has no usable subtitle track. Also use it when the user asks to transcribe media that already exists locally or was downloaded by another workflow.
This subskill is part of the self-contained video-study-notes skill. Its helper scripts live under subskills/media-transcribe/scripts/.
Use this companion skill when a local audio or video file needs transcription, especially when neither an upstream download nor a local sidecar subtitle file provides usable subtitle text.
Prefer the bundled Python script subskills/media-transcribe/scripts/transcribe_audio.py. Run it from the skill-local .venv managed by uv. If the input is a video file and the parent workflow wants a deterministic copy under <project_root>/audio/, first use scripts/prepare_audio.py.
<project_root>/audio/<stem>.wavscripts/prepare_audio.pysubskills/media-transcribe/scripts/transcribe_audio.py to create .txt, .srt, and .json outputs.--language zh.transcripts/ directory when the parent workflow already chose a per-video project root. Otherwise fall back to downloads/media-transcribe/transcripts/.turbo, device cuda, compute type float16, and batched inference.small, device cpu, compute type int8.This is a pragmatic default: it favors high throughput on GPU while keeping CPU fallback usable.
The examples below assume the current directory is the skill root.
Prepare audio deterministically from a local video before transcription:
mkdir -p output/example/audio
.venv/bin/python scripts/prepare_audio.py --input "output/example/video/example.mp4" --output-dir output/example/audio
Transcribe a local audio file with automatic device/model selection:
mkdir -p downloads/media-transcribe/transcripts
.venv/bin/python subskills/media-transcribe/scripts/transcribe_audio.py --input "downloads/media-transcribe/audio/example.m4a" --output-dir downloads/media-transcribe/transcripts --language zh
Force CPU mode:
.venv/bin/python subskills/media-transcribe/scripts/transcribe_audio.py --input "downloads/media-transcribe/audio/example.m4a" --device cpu --compute-type int8 --model small
For an input named example.m4a, the script writes:
example.txtexample.srtexample.transcription.json