Name: Video Transcribe
Author: jamditis

Video transcription with Whisper

Batch transcribe video files using OpenAI Whisper with GPU acceleration. Produces word-level timestamp JSON and plain text transcripts for each video.

Prerequisites

Verify before starting:

python -c "import whisper; print('Whisper OK')"
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

If Whisper fails to import, check for NumPy version conflicts:

Whisper's numba dependency requires NumPy < 2.4
Fix: pip install "numpy<2.4"

If CUDA is unavailable, Whisper will run on CPU (much slower but functional).

Workflow

Step 1: Auto-detect model

Check available GPU memory and select the appropriate model:

Free VRAM	Model	Speed	Accuracy
>= 6 GB	turbo	Fast (5-8x real-time)	Near-large quality
>= 3 GB	medium	Moderate	Good for clear speech
>= 1 GB	base	Moderate	Acceptable
No GPU	base (CPU)	Slow (0.5-1x real-time)	Acceptable

Video Transcribe

Video transcription with Whisper

Prerequisites

Workflow

Step 1: Auto-detect model

Video Transcribe

Video transcription with Whisper

Prerequisites

Workflow

Step 1: Auto-detect model

Step 2: Locate videos

Step 3: Set up output directories

Step 4: Run transcription

Step 5: Verify and report

Key lessons

Reference scripts

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api