Download Instagram Reels, transcribe audio, and extract captions. Share a reel URL and get back a full transcript with the original description.
Download Instagram Reels, transcribe the audio, and extract the caption/description.
pip install yt-dlp
apt install ffmpeg # or: brew install ffmpeg
export GROQ_API_KEY="your-groq-api-key"
Process a reel in three steps: extract metadata, download audio, transcribe.
yt-dlp --write-info-json --skip-download -o "/tmp/reel" "REEL_URL"
This writes /tmp/reel.info.json with the caption, uploader, CDN URLs, and other metadata. No login required for public reels.
Extract the audio CDN URL from metadata and download it directly:
AUDIO_URL=$(python3 -c "
import json
d = json.load(open('/tmp/reel.info.json'))
for f in d.get('formats', []):
if f.get('ext') == 'm4a':
print(f['url'])
break
")
curl -sL "$AUDIO_URL" -o /tmp/reel-audio.m4a
ffmpeg -y -i /tmp/reel-audio.m4a -acodec libmp3lame -q:a 4 /tmp/reel-audio.mp3
curl -s https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-F "file=@/tmp/reel-audio.mp3" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json"
Returns JSON with text (full transcript) and segments (with timestamps). Language is auto-detected.
python3 -c "
import json
d = json.load(open('/tmp/reel.info.json'))
print('Caption:', d.get('description', 'No caption'))
print('Author:', d.get('uploader', 'Unknown'))
print('Duration:', round(d.get('duration', 0)), 'seconds')
"
yt-dlp --cookies /path/to/cookies.txt --write-info-json --skip-download -o "/tmp/reel" "REEL_URL"rm -f /tmp/reel.info.json /tmp/reel-audio.*# Full transcription pipeline
yt-dlp --write-info-json --skip-download -o "/tmp/reel" "https://www.instagram.com/reel/ABC123/" && \
AUDIO_URL=$(python3 -c "import json; [print(f['url']) for f in json.load(open('/tmp/reel.info.json')).get('formats',[]) if f.get('ext')=='m4a'][:1]") && \
curl -sL "$AUDIO_URL" -o /tmp/reel-audio.m4a && \
ffmpeg -y -i /tmp/reel-audio.m4a -acodec libmp3lame -q:a 4 /tmp/reel-audio.mp3 2>/dev/null && \
curl -s https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-F "file=@/tmp/reel-audio.mp3" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json"
# Just get the caption (no transcription)
yt-dlp --write-info-json --skip-download -o "/tmp/reel" "https://www.instagram.com/reel/ABC123/" && \
python3 -c "import json; d=json.load(open('/tmp/reel.info.json')); print(d.get('description',''))"
# Transcribe a TikTok video (same pipeline)
yt-dlp --write-info-json --skip-download -o "/tmp/reel" "https://www.tiktok.com/@user/video/123" && \
AUDIO_URL=$(python3 -c "import json; [print(f['url']) for f in json.load(open('/tmp/reel.info.json')).get('formats',[]) if f.get('ext')=='m4a'][:1]") && \
curl -sL "$AUDIO_URL" -o /tmp/reel-audio.m4a && \
ffmpeg -y -i /tmp/reel-audio.m4a -acodec libmp3lame -q:a 4 /tmp/reel-audio.mp3 2>/dev/null && \
curl -s https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-F "file=@/tmp/reel-audio.mp3" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json"