Transcribe audio files (m4a, mp3, wav, ogg, flac, aac, webm, mp4) to text using local whisper-cpp (offline, no API). Use when asked to transcribe locally, convert speech to text offline, or process audio recordings without sending data externally. Converts to 16kHz WAV via ffmpeg, then runs whisper-cli locally. Does not do speaker diarization or advanced transcription.
Transcribe audio files to text using whisper-cpp (local, offline).
skills/local-transcribe/scripts/transcribe.sh
skills/local-transcribe/scripts/transcribe.sh "<audio-file>"
The transcript is printed to stdout. Use --output to save to a file instead.
--output PATH — save transcript to a file instead of stdout--model PATH — path to a whisper.cpp GGML model file (auto-detected if omitted)# Transcribe and print to stdout
skills/local-transcribe/scripts/transcribe.sh recording.m4a
# Transcribe and save to file
skills/local-transcribe/scripts/transcribe.sh recording.m4a --output transcript.txt
# Use a specific model
skills/local-transcribe/scripts/transcribe.sh recording.m4a --model ~/.cache/whisper-cpp/ggml-medium.bin
Any format ffmpeg can decode: .m4a, .mp3, .wav, .ogg, .flac, .aac, .webm, .mp4
brew install whisper-cppsudo apt-get install whisper.cppyay -S whisper.cpp-gitbrew install ffmpegsudo apt-get install ffmpeg~/.cache/whisper-cpp/, or specify with --model
curl -L -o ~/.cache/whisper-cpp/ggml-small.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin| Model | Size | Speed | Quality |
|---|---|---|---|
ggml-tiny.bin | 75 MB | Fastest | Lower accuracy |
ggml-base.bin | 142 MB | Fast | Decent |
ggml-small.bin | 466 MB | Moderate | Good (recommended) |
ggml-medium.bin | 1.5 GB | Slower | Better |
ggml-large.bin | 3.1 GB | Slowest | Best |
The script searches ~/.cache/whisper-cpp/ for models automatically, preferring small > base > medium > tiny.
The raw transcript is plain text. You can then: