Extract actionable steps from YouTube videos using Gemini video understanding. Use when user provides a YouTube link and wants to learn procedures, extract steps, understand visual tutorials, or turn video content into executable instructions.
Take a YouTube URL → send the video to Gemini → get back structured, actionable steps that Claude can understand and execute. Works for tutorials, demos, how-tos, and any procedural video content.
--quick): Extracts transcript only → sends to Gemini → faster and cheaper but no visual context# Default: full video analysis (visual + audio)
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID"
# Ask a specific question about the video
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID" -q "What keyboard shortcuts are demonstrated?"
# Quick mode: transcript only (faster, cheaper)
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID" --quick
# Save to file
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID" -o active/steps.md
# JSON output
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID" --json
# Use Gemini Pro for maximum detail (slower, more expensive)
python3 .claude/skills/video-to-action/video_to_action.py "https://youtube.com/watch?v=VIDEO_ID" -m gemini-2.5-pro
| Mode | Flag | Speed | Cost | Best For |
|---|---|---|---|---|
| Full | (default) | ~1-3 min | ~$0.05-0.20 | Visual tutorials (Blender, Figma, code editors), demos with UI |
| Quick | --quick | ~15-30s | ~$0.01-0.03 | Talks, lectures, podcasts, text-heavy content |
After extracting steps, Claude can:
User: Learn how to model a donut from this video: https://youtube.com/watch?v=...
Requires in .env:
NANO_BANANA_API_KEY=your_gemini_api_key
yt-dlp — video/transcript downloadffmpeg — frame extraction (fallback mode)google-genai — Gemini APIPillow — image handling (fallback mode)python-dotenv--quick mode or ask about specific timestamps--quick mode will fail, use full mode