Transcribe any audio or video file to text using Whisper (Groq or OpenAI). Use when the agent receives voice messages, audio files, video messages, or any media with speech. Triggers on: 'transcribe', 'what does this say', 'voice message', 'speech to text', 'audio', any file path ending in .ogg .mp3 .mp4 .wav .webm .m4a .flac .oga .oga
Transcribe any audio or video file to text. Uses Groq Whisper (fastest, near-instant) with OpenAI fallback.
kwhisper --file /path/to/audio.ogg
kwhisper --file /workspace/telegram-files/voice.oga
kwhisper --file /workspace/slack-files/audio.mp3 --language en
kwhisper --file /workspace/meeting.mp4 --timestamps
{"ok": true, "text": "The transcribed text...", "provider": "groq", "language": "en", "duration": 12.5}
| Flag | Description |
|---|---|
--file |
| Path to audio/video file (required) |
--language | ISO-639-1 code (en, de, es) — optional, auto-detected |
--timestamps | Include segment-level timestamps |
--prompt | Hint text to guide transcription |
mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, oga, flac
Requires GROQ_API_KEY (preferred — near-instant) or OPENAI_API_KEY.