Transcribe audio/video files using speech-to-text providers. Use when user has audio or video files to transcribe.
Transcribe audio and video files to text using speech-to-text providers (currently Soniox).
uv run scripts/transcribe.py --input recording.mp4
uv run scripts/transcribe.py --input /path/to/videos/ --output-dir /path/to/output/
uv run scripts/transcribe.py --input meeting.m4a \
--context "Board meeting Q4 review" \
--terms "EBITDA,YoY,ARR,Zone,Simply South"
| Flag | Short | Description | Default |
|---|---|---|---|
--input | -i | Audio/video file or directory (required) | -- |
--output-dir | -o | Output directory | Same as input |
--provider | -p | STT provider (soniox) | soniox |
--context | -c | Free-text context for accuracy | "" |
--terms | -t | Comma-separated domain terms | "" |
--language | -l | Language hint ISO code | en |
--no-cleanup | Keep remote files after transcription | false | |
--no-combined | Skip combined transcript for directories | false |
.mov, .mp4, .m4a, .mp3, .wav, .webm, .ogg, .flac, .aac, .aiff, .amr, .asf
For each input file example.mp4:
example-transcript.json - Raw API response with tokensexample-transcript.md - Readable markdown with speaker labelsFor multi-file runs (unless --no-combined):
combined-transcript.md - All transcripts in one fileSONIOX_API_KEY must be set in the environment or in ~/pro/personal_os/.envMeeting recording:
uv run scripts/transcribe.py -i ~/Downloads/standup-2026-02-05.m4a \
-o context/daily/2026-02-05/standup/ \
--context "Daily standup meeting, Zone team" \
--terms "Zone,ZonEye,Simply South,Vinoz"
Batch factory tour videos:
uv run scripts/transcribe.py -i /path/to/factory-videos/ \
-o context/daily/2026-02-05/factory-tour/ \
--context "Factory tour at candy manufacturing facility" \
--terms "tempering,enrobing,fondant,ganache"