Transcribe and summarize video or audio content. Use when the user shares a video URL (X/Twitter, direct mp4/webm link), asks to 'transcribe this', 'summarize this video', 'what does this video say', or provides a tweet URL containing a video.
Transcribe video/audio content and produce a structured summary.
Determine the video source from the user's input:
go run . read <id> --json from the birdy repo root to get the media[].videoUrl. If multiple video qualities exist, prefer the highest resolution.If the source is an X/Twitter URL and go run . fails (not in birdy repo), fall back to birdy read <id> --json or bird read <id> --json.
mkdir -p /tmp/transcribe-work.curl -L -o /tmp/transcribe-work/video.mp4 "<url>".ffmpeg -y -i /tmp/transcribe-work/video.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 /tmp/transcribe-work/audio.wav
rm /tmp/transcribe-work/video.mp4.mlx_whisper is importable in Python 3. If not, install it: pip3 install mlx-whisper.import mlx_whisper
result = mlx_whisper.transcribe(
'/tmp/transcribe-work/audio.wav',
path_or_hf_repo='mlx-community/whisper-small-mlx',
language='en'
)
with open('/tmp/transcribe-work/transcript.txt', 'w') as f:
for seg in result['segments']:
start = int(seg['start'])
m, s = divmod(start, 60)
f.write(f'[{m:02d}:{s:02d}] {seg["text"].strip()}\n')
mlx-community/whisper-small-mlx.language parameter or set it appropriately./tmp/transcribe-work/transcript.txt./tmp/transcribe-work/transcript.txt./tmp/transcribe-work/audio.wav after transcription to free space. Keep transcript.txt.