Transcribe local audio/video files offline using whisper.cpp (the C++ port of OpenAI Whisper), generating plain text, timestamped, SRT, and JSON outputs. Use when the user wants fast native-speed transcription with GGML quantized models, or prefers whisper.cpp over Python-based alternatives like faster-whisper. Triggers on mentions of whisper.cpp, whisper-cli, GGML models, or requests for high-performance local transcription.
Use this skill for local-only transcription with whisper.cpp (whisper-cli).
The key advantage over Python-based whisper (faster-whisper) is raw speed: whisper.cpp runs optimized C++ inference with optional GPU acceleration, quantized GGML models, and minimal memory footprint.
python3 scripts/transcribe_whispercpp.py "path/to/audio.mp4" \
--model-path ~/models/ggml-small.bin \
--output-dir ./output/transcribe-whispercpp
whisper-cli is installed and a GGML model is downloaded..transcript.txt for plain text and .transcript.timed.txt for timestamps.ggml-medium.bin or ggml-large-v3-q5_0.bin).Single file:
python3 scripts/transcribe_whispercpp.py "./input/video.mp4" \
--model-path ~/models/ggml-small.bin \
--language pt \
--output-dir ./output/transcribe-whispercpp
Multiple files:
python3 scripts/transcribe_whispercpp.py "./a.mp3" "./b.wav" \
--model-path ~/models/ggml-small.bin \
--output-dir ./output/transcribe-whispercpp
Force WAV conversion (useful for formats whisper-cli struggles with):
python3 scripts/transcribe_whispercpp.py "./input/video.mp4" \
--model-path ~/models/ggml-small.bin \
--force-wav \
--output-dir ./output/transcribe-whispercpp
For each input file <name>:
<name>.transcript.txt — plain text transcript<name>.transcript.timed.txt — [start --> end] text format<name>.transcript.json — structured JSON with segments<name>.srt — SRT subtitle file (generated by whisper-cli)Download GGML models from Hugging Face:
# Small model (~500MB, good balance of speed and quality)
curl -L -o ~/models/ggml-small.bin \
'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin'
# Large v3 quantized (~1GB, best quality with reasonable size)
curl -L -o ~/models/ggml-large-v3-q5_0.bin \
'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-q5_0.bin'
Or use the bundled download script from whisper.cpp:
sh ./models/download-ggml-model.sh small
Install whisper-cli (one of):
# macOS via Homebrew
brew install whisper-cpp
# pip (cross-platform, no GPU accel)
pip install whisper.cpp-cli
# Or build from source
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp && cmake -B build && cmake --build build -j --config Release
Required:
ffmpeg — for audio conversion to 16kHz WAV when needed.--threads N to control CPU thread count (default: 4).-osrt flag and then parses stdout for the timed text output.