Make sure to use this skill whenever the user asks to transcribe, transcribe meetings, or transcribe voice memos. Use this skill to securely launch local transcription in the background on Apple Silicon.
Launches the local transcription script in the background. Uses mlx-whisper on Apple Silicon GPU with Sortformer speaker diarization -- all on-device, no audio leaves the machine.
Input: Picks up audio files from inbox/recordings/ (m4a, mp3, wav, ogg, flac, webm, mp4). Drop recordings there from Voice Memos, Zoom, or any source.
Output: Writes .txt files to inbox/transcripts/ with a header (date, time, duration, speakers, audio path) followed by the transcript. This is the same inbox that /process-interaction reads from, so after transcription finishes you just run "process transcripts" as usual.
/transcribe, "transcribe my meetings", "transcribe voice memos", etc.
Optional argument: a number to limit how many recordings to transcribe. E.g., /transcribe 1 transcribes only the first new recording. Without a number, transcribes all new recordings.
~/.whisperx-env/bin/python scripts/transcribe.py --status
If transcription is already running, report that and stop. Don't launch a second instance.
~/.whisperx-env/bin/python scripts/transcribe.py [--limit N]
Run this with the Bash tool using run_in_background: true. Include --limit N if a number was specified.
Timeout: Set to 600000 (10 min) as a minimum. For large batches, estimate ~30 min per hour of audio on Apple Silicon.
Tell the user:
That's it. Do not poll. Do not wait.
If asked "is transcription done?" or "how's the transcription going?":
~/.whisperx-env/bin/python scripts/transcribe.py --status
If not running, check for pending transcripts:
ls inbox/transcripts/*.txt 2>/dev/null | wc -l
inbox/.transcribe.pid) on start and removes it on exit.inbox/transcribed.log tracks which Voice Memo files have already been processed, so re-running is safe.