Name: Operations Media Transcription
Author: GuidoSteenbergen1511

Skills suchen.../

Operations Media Transcription | Skills Pool

cd "{baseDir}/Videos"

whisper "filename.mp4" --model medium --language en --output_format txt

whisper "{baseDir}/Videos/my_video.mp4" --model medium --language en --output_format txt

ls -la *.txt

cat "my_video.txt"

Task	Command
Transcribe to text	`whisper "file.mp4" --model medium --language en --output_format txt`
Generate subtitles	`whisper "file.mp4" --model medium --language en --output_format srt`
Web captions (VTT)	`whisper "file.mp4" --model medium --language en --output_format vtt`
All formats at once	`whisper "file.mp4" --model medium --language en --output_format all`

Model	Speed	Accuracy	VRAM	When to Use
`tiny`	~1 min/10min	Lower	~1GB	Quick test, draft (accuracy sacrifice acceptable)
`base`	~2 min/10min	Moderate	~1GB	Clear speech, low background noise
`small`	~4 min/10min	Good	~2GB	General use, balanced cost/accuracy
`medium`	~8 min/10min	High	~5GB	Default - interviews, meetings, professional content
`large`	~15 min/10min	Highest	~10GB	Accents, poor audio, critical accuracy required

cd "{baseDir}/Videos/"
whisper "CEO_Interview.mp4" --model medium --language en --output_format txt
cat "CEO_Interview.txt"

whisper "Presentation_2025.mp4" --model medium --language en --output_format srt

whisper "Teammeeting_Jan.mp4" --model medium --language nl --output_format txt

whisper "3hour_webinar.mp4" --model tiny --language en --output_format txt

whisper "International_Panel.mp4" --model large --output_format txt

whisper "video.mp4" --model medium --language en --output_format txt --output_dir "/path/to/output/"

cd "/path/to/source/folder" && whisper "video.mp4" --model medium --language en --output_format txt

Problem	Solution
CUDA out of memory	Use smaller model: `--model small` or `--model base`
Very slow	Use `--model tiny` or `--model base`
Poor accuracy	Use `--model large` and/or specify correct `--language`
Output in wrong folder	Use `--output_dir` or `cd` to target folder first
Command not found	Run `pip install openai-whisper`

whisper "long_video.mp4" --model medium --language en --output_format txt &

ps aux | grep whisper

Setting	Value	Rationale
Default Model	`medium`	Balances accuracy (~90-95% WER for clear speech) with reasonable speed (~8min/10min video) and VRAM (~5GB typical) without extreme overhead
Default Language	`en`	Assumes English context; auto-detect (`--language` omitted) adds ~5-10% processing overhead but handles mixed-language audio
VRAM Threshold - Switch to `small`	4GB available	`medium` requires ~5GB peak; below 4GB, `small` model recommended to avoid OOM
VRAM Threshold - Switch to `base`	2GB available	Extreme resource constraint; `base` still provides acceptable accuracy for clear speech
Processing Speed Ratio	1min Whisper per 10min video	Approximate for `medium` model on modern GPU; actual varies by audio quality, sampling rate, and hardware

whisper --help
# Should print usage info; if "command not found", run: pip install openai-whisper

whisper "{baseDir}/test_sample.mp4" --model tiny --language en --output_format txt
# Rationale: Use tiny model for quick validation (30 seconds max processing)
cat "{baseDir}/test_sample.txt"
# Verify output is readable text with minimal formatting errors

whisper "{baseDir}/test_sample.mp4" --model small --output_format txt
whisper "{baseDir}/test_sample.mp4" --model small --output_format srt
whisper "{baseDir}/test_sample.mp4" --model small --output_format vtt
# Verify .txt is plain text, .srt has timestamps [00:00:00,000 --> 00:00:05,000], .vtt is WebVTT format

whisper "{baseDir}/multilingual_sample.mp4" --output_format txt
# Omit --language flag to test auto-detection
# Check output header for detected language in JSON format if needed

Language	Code
English	`en`
Dutch	`nl`
German	`de`
French	`fr`
Spanish	`es`
Auto-detect	(omit --language flag)

Operations Media Transcription

Media Transcription

Why This Exists

Media Transcription

Prerequisites

Operations Media Transcription

Media Transcription

Why This Exists

Media Transcription

Prerequisites

Step-by-Step Workflow

Step 1: Navigate to the file location

Step 2: Run Whisper transcription

Step 3: Find the output

Step 4: Read the transcript

Complete Command Reference

Basic Commands

Model Selection

Language Codes

Real-World Examples

Example 1: Transcribe a CEO interview video

Example 2: Generate subtitles for a presentation

Example 3: Transcribe Dutch meeting recording

Example 4: Quick draft of long video

Example 5: High-accuracy transcription with accents

Output Location Control

Troubleshooting

Background Execution

Performance & Magic Numbers

Testing & Validation

1. Verify Installation

2. Test with Sample Audio

3. Validate Output Formats

4. Language Detection

5. Quality Checks

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api