Name: Transcription Skill
Author: openclaw

Converts audio and video files into clean, readable text using OpenAI's Whisper API and ffmpeg for media handling.

Overview

This skill handles the full pipeline:

Media extraction — use ffmpeg to strip audio from video files and convert to a Whisper-compatible format
Chunking — split large files (>25 MB) into overlapping segments to stay within API limits
Transcription — send each chunk to OpenAI's Whisper API
Assembly — merge chunk transcripts, adjusting timestamps, into a single clean output
Post-processing — optionally clean up with Claude (punctuation, speaker labels, summaries)

Requirements

ffmpeg must be installed (which ffmpeg to verify — it's usually pre-installed in claude.ai's environment)
OpenAI API key stored in the environment as OPENAI_API_KEY — the user must provide this
Python packages: openai, (install via pip if needed)

Converts audio and video files into clean, readable text using OpenAI's Whisper API and ffmpeg for media handling.

This skill handles the full pipeline:

Media extraction — use ffmpeg to strip audio from video files and convert to a Whisper-compatible format
Chunking — split large files (>25 MB) into overlapping segments to stay within API limits
Transcription — send each chunk to OpenAI's Whisper API
Assembly — merge chunk transcripts, adjusting timestamps, into a single clean output
Post-processing — optionally clean up with Claude (punctuation, speaker labels, summaries)

ffmpeg must be installed (which ffmpeg to verify — it's usually pre-installed in claude.ai's environment)
OpenAI API key stored in the environment as OPENAI_API_KEY — the user must provide this
Python packages: openai, (install via pip if needed)

Category	Formats
Audio	mp3, wav, m4a, ogg, flac, aac, opus, wma
Video	mp4, mov, avi, mkv, webm, wmv, m4v

Flag	Default	Description
`--model`	`whisper-1`	Whisper model to use (`whisper-1`, `gpt-4o-transcribe`)
`--language`	auto-detect	ISO 639-1 language code (e.g. `en`, `ar`, `fr`)
`--format`	`txt`	Output format: `txt`, `srt`, `vtt`, `json`
`--timestamps`	off	Include timestamps in output
`--chunk-size`	`20`	Max chunk size in MB (must be ≤ 25)
`--prompt`	none	Context hint to improve accuracy (e.g. domain vocab)

Error	Fix
`AuthenticationError`	Invalid API key — ask user to verify
`RateLimitError`	Wait 60s and retry, or use `--chunk-size 10`
`InvalidRequestError: file too large`	Reduce `--chunk-size` below 25
`ffmpeg not found`	`sudo apt install ffmpeg` or `brew install ffmpeg`
`No audio stream found`	File may be corrupt or wrong format