Transcribe video audio and burn hardcoded subtitles using mlx-whisper and ffmpeg. Use when the user wants to add subtitles to a video, transcribe audio to SRT, burn captions into video, or process video for web delivery.
Three-step pipeline: Extract audio → Transcribe → Burn subtitles
Optimized for Apple Silicon using mlx-whisper (Metal-accelerated Whisper).
# ffmpeg (media processing)
brew install ffmpeg
# mlx-whisper (Apple Silicon optimized transcription)
brew install pipx
pipx install mlx-whisper
Do NOT use pip install directly on macOS — it will fail with externally-managed-environment.
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/audio.wav
Whisper expects 16kHz mono WAV for best results.
mlx_whisper /tmp/audio.wav \
--model mlx-community/whisper-large-v3-turbo \
--language zh \
--output-format srt \
--output-dir /tmp/
Important CLI quirks:
--output-format and --output-dir (with dashes, not underscores)/tmp/audio.srt| Model | Size | Speed | Quality |
|---|---|---|---|
whisper-large-v3-turbo | ~1.5GB | Fast | Good for most languages |
whisper-large-v3 | ~3GB | Slower | Best accuracy |
whisper-medium | ~750MB | Fastest | Acceptable for clear audio |
Always review the SRT file before burning. Common issues:
SRT format reference:
1