Transcribe audio or video locally using mlx-whisper on Apple Silicon. Produces a markdown transcript file. TRIGGER when user says "transcribe this", "what does this audio say", "transcribe the video", provides a YouTube URL for transcription, or sends a voice message to transcribe. Supports YouTube URLs, local files, and Telegram voice messages. Invoked with /transcribe.
Local audio/video transcription using Apple Silicon-optimized whisper models. No cloud APIs.
When the user invokes /transcribe, they will provide one of:
~/.homaruscc/telegram-media/yt-dlp -x --audio-format wav -o "/tmp/transcribe-%(id)s.%(ext)s" "<URL>"
If isn't found, tell the user to install it:
yt-dlpbrew install yt-dlppython3 -c "
import mlx_whisper
result = mlx_whisper.transcribe('<audio_file>', path_or_hf_repo='mlx-community/whisper-large-v3-turbo', language='en')
print(result['text'])
"
Model selection:
mlx-community/whisper-large-v3-turbo (best quality, still fast on Apple Silicon)mlx-community/whisper-base-mlx (use for voice messages under 30s)Fallback chain: mlx-whisper -> faster-whisper -> whisper-cli (whisper-cpp)
If mlx-whisper isn't installed: pip3 install mlx-whisper
ClawdBot/HalShare/transcripts/<date>-<video-id>.md with frontmatter:
# <Video Title>
**Video:** <URL>
**Video ID:** <id>
**Source:** mlx-whisper (local)
**Model:** <model used>
**Date extracted:** <YYYY-MM-DD>
---
<transcript text>
<filename>.transcript.md, or to /tmp/ if the source is in a read-only locationlanguage parameter defaults to English; ask if the content might be in another language