Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription.
This skill provides video processing utilities including audio extraction, format conversion, and audio transcription using FFmpeg and OpenAI's Whisper model.
Required tools (must be installed in your environment):
FFmpeg: Multimedia framework for video/audio processing
# macOS
brew install ffmpeg
# Ubuntu/Debian
apt-get install ffmpeg
# Verify installation
ffmpeg -version
OpenAI Whisper: Speech-to-text transcription model
# Install via pip
pip install -U openai-whisper
# Verify installation
whisper --help
Python packages (included in script via PEP 723):
Use the scripts/video_processor.py script for all video processing tasks. The script provides a simple CLI with the following commands:
Extract the audio track from a video file:
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav
Options:
--format: Output audio format (default: wav). Supports: wav, mp3, aac, flacConvert any video file to MP4 format:
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4
Options:
--codec: Video codec (default: libx264). Common options: libx264, libx265, h264--preset: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslowConvert any video file to WebM format (web-optimized):
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm
Options:
--codec: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9Transcribe audio or video files to text using OpenAI's Whisper model:
# Transcribe video file (audio will be extracted automatically)
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt
# Transcribe audio file directly
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt
Options:
--model: Whisper model size (default: base). Options:
tiny: Fastest, lowest accuracy (~1GB RAM)base: Fast, good accuracy (~1GB RAM) [DEFAULT]small: Balanced (~2GB RAM)medium: High accuracy (~5GB RAM)large: Best accuracy, slowest (~10GB RAM)--language: Language code (default: auto-detect). Examples: en, es, fr, de, zh--format: Output format (default: txt). Options: txt, srt, vtt, jsonTranscription workflow:
Process a video end-to-end:
# 1. Extract audio for analysis
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav
# 2. Transcribe to SRT subtitles
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small
# 3. Convert to web format
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm
FFmpeg and Whisper Integration:
Audio Format for Transcription:
Output Formats:
The script includes comprehensive error handling:
tiny or base models for quick draftssmall or medium for production transcriptionslarge only when maximum accuracy is requiredUser request:
I have an AVI file from my old camera. Can you convert it to MP4?
You would:
uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4
User request:
I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it?
You would:
uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base
User request:
I need to put this video on my website with subtitles. Can you help?
You would:
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small
User request:
I have a Spanish interview video that needs an accurate transcript for publication.
You would:
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es
User request:
I have a folder of training videos that all need to be converted to WebM and transcribed.
You would:
ls training_videos/*.mp4
# For each video: video1.mp4, video2.mp4, etc.
uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm
uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base
# Repeat for each file
The video-processor skill provides a unified interface for common video processing tasks:
All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.