Given a video/audio URL, fetches the transcript or lyrics, detects the source language, and presents an interleaved English translation with a word-by-word glossary per line. Use when the user provides a URL and asks to translate, transcribe, or understand content in any language — including song lyrics, podcasts, speeches, or any video/audio. Supports YouTube (auto-captions preferred), Instagram, Twitter/X, Facebook, LinkedIn, and any yt-dlp-supported URL. Optional directive: [transcript-only] to use captions only without Whisper fallback.
Fetch transcript from a URL and produce an interleaved English translation with word-by-word glossary.
The user provides a URL, optionally followed by a directive in [...]: