Transcribe local audio/video files via Triton HTTP (whisper_model). Use when the user asks to get a transcript from a voice file or attached media, and the input is a local file path (e.g., `{{MediaPath}}`) that should be decoded to PCM and sent to Triton as `AUDIO_BYTES` to receive `TRANSCRIPT`.
Transcribe a local audio/video file by decoding to PCM and calling Triton whisper_model over HTTP, returning JSON with result.value.
cd /home/openclaw/skills/whisper-transcribe
python3.12 -m venv .venv
. .venv/bin/activate
pip install -r scripts/requirements.txt
ffmpeg required).python3.12 scripts/whisper_transcribe.py --path "{{MediaPath}}"
{"result":{"value":"<transcript>"}}
{{MediaPath}} is the local temp file path for inbound attachments in OpenClaw.ffmpeg available in PATH.AUDIO_BYTES.--chunk-seconds and --overlap-seconds or env vars CHUNK_SECONDS, OVERLAP_SECONDS.TRITON_SERVER_URL=moderation-ingress-controller.moderation.k8s.moderation-xs/whisper-triton, ENSEMBLE_NAME=whisper_model (scheme is optional; it is stripped).scripts/whisper_transcribe.py — calls Triton and prints JSON.scripts/requirements.txt — Python deps.