Skill ファイル

Video Production — Strategy Map

Name: Video Production — Strategy Map
Author: Zoharvan12

Full-stack video production assistant. Analyzes video content visually (Gemini), generates transcriptions/SRT subtitles, plans and creates motion graphics (Remotion), generates B-roll images/videos, produces timeline XMLs for Premiere/DaVinci. Downloads YouTube videos with yt-dlp. Use for: video analysis, visual analysis, describe video, what's in this video, transcription, subtitles, motion graphics, B-roll, shorts, timeline XML, clip cutting, silence removal, After Effects, Premiere Pro, DaVinci Resolve, YouTube download. Keywords: video edit, ffmpeg, remotion, after effects, premiere, davinci, shorts, subtitles, motion graphics, clip, render, transcribe, xml, timeline, b-roll, talking head, analyze, yt-dlp, youtube, download, gemini, vision

Zoharvan120 スター2026/04/16

職業
カテゴリ: メディア

スキル内容

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

The agent has built-in vision for images. For videos, always use Gemini via Kolbo MCP.

Media type	Action
Image (jpg, png, etc.)	Agent reads it directly — no upload needed
Video — "analyze", "describe", "what's in this?", "what prompts?", file path with no instruction	`upload_media` → `chat_send_message` + Gemini
Transcription — "transcribe", "subtitles", "SRT", "what's being said", "captions"	`transcribe_audio` only
Both visual + transcript	Run both

Never use ffmpeg to extract frames for analysis. Never use local Ollama/vision models. Commit to the right action — do not ask the user. Wait for chat_send_message to return before proceeding — it polls until done (up to 2 min). Do NOT fall back to ffmpeg or any other approach if it takes time.

Video Production — Strategy Map

Zoharvan120 スター2026/04/16

職業
カテゴリ: メディア

スキル内容

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

The agent has built-in vision for images. For videos, always use Gemini via Kolbo MCP.

Media type	Action
Image (jpg, png, etc.)	Agent reads it directly — no upload needed
Video — "analyze", "describe", "what's in this?", "what prompts?", file path with no instruction	`upload_media` → `chat_send_message` + Gemini
Transcription — "transcribe", "subtitles", "SRT", "what's being said", "captions"	`transcribe_audio` only
Both visual + transcript	Run both

関連 Skill

Tool	Use
`upload_media`	Upload local file to Kolbo CDN → get stable public URL
`chat_send_message`	Send message + `media_urls` array to Gemini for visual analysis
`transcribe_audio`	Transcribe audio/video to text + SRT (ElevenLabs Scribe)
`generate_image`	Generate B-roll images
`generate_video`	Generate B-roll videos
`generate_video_from_image`	Animate a still into video
`generate_music`	Generate background music
`generate_speech`	TTS for voiceover
`generate_sound`	Sound effects
`list_models`	Browse available models by type
`check_credits`	Check remaining Kolbo credit balance

Step 1: upload_media({ source: "/absolute/path/to/video.mp4" })
  → Returns: { url, thumbnail_url, ... }
  → Save the "url" field — this is the CDN URL you will pass to Gemini
  → NEVER use thumbnail_url (it's a JPG preview, not the video)

Step 2: chat_send_message({
  message: "Describe this video in detail. What is shown?",
  media_urls: ["<url from step 1>"]   ← must be an array, must be the "url" field
})
→ returns: { content: "..." }

Input: local video / YouTube URL / uploaded file

→ [DEFAULT] Visual Analysis: upload_media → chat_send_message (Gemini)
→ [EXPLICIT REQUEST] Transcription: transcribe_audio → SRT / text
→ [EDITING] FFmpeg: cut, silence removal, 9:16 conversion
→ [MOTION GRAPHICS] Remotion: compositions, captions, B-roll
→ Output: Premiere XML / DaVinci EDL / MP4s / SRT

Service	Use
Kolbo MCP (`upload_media` + `chat_send_message`)	Primary — visual video/image analysis via Gemini
Kolbo MCP (`transcribe_audio`)	Primary — transcription, word-level SRT, multilingual
yt-dlp	Download YouTube/social media videos
FFmpeg	Local video editing, cutting, silence removal, format conversion
Remotion Lambda	Cloud render motion graphics
fal.ai (MCP)	Image & video B-roll generation
ElevenLabs	TTS, voice cloning, SFX (via Kolbo MCP `generate_speech`)
Suno	Background music (via Kolbo MCP `generate_music`)

# Best quality MP4
yt-dlp -f "bestvideo[height<=1080][ext=mp4]+bestaudio/best" \
  --merge-output-format mp4 \
  -o "%(id)s.%(ext)s" <url>

# With subtitles
yt-dlp -f "bestvideo[height<=1080][ext=mp4]+bestaudio/best" \
  --write-auto-sub --sub-lang en --convert-subs srt \
  --merge-output-format mp4 \
  -o "%(id)s.%(ext)s" <url>

# Audio only (for transcription)
yt-dlp -f "bestaudio" --extract-audio --audio-format mp3 -o "%(id)s.%(ext)s" <url>

import requests

def transcribe(audio_path, api_key, language="he"):
    with open(audio_path, "rb") as f:
        response = requests.post(
            "https://api.elevenlabs.io/v1/speech-to-text",
            headers={"xi-api-key": api_key},
            files={"file": f},
            data={"model_id": "scribe_v1", "language_code": language,
                  "timestamps_granularity": "word", "diarize": True}
        )
    return response.json()

filter_complex = (
    "[0:v]split[bg][fg];"
    "[bg]scale=1080:1920:force_original_aspect_ratio=increase,"
    "crop=1080:1920,gblur=sigma=40[blurred];"
    "[fg]scale=1080:1920:force_original_aspect_ratio=decrease,"
    "pad=1080:1920:(ow-iw)/2:(oh-ih)/2:color=black@0[front];"
    "[blurred][front]overlay=0:0"
)

import subprocess, json

def detect_silence(video_path, noise_db=-35, duration=0.4):
    result = subprocess.run([
        "ffmpeg", "-i", video_path,
        "-af", f"silencedetect=noise={noise_db}dB:d={duration}",
        "-f", "null", "-"
    ], capture_output=True, text=True)
    # Parse silence_start/silence_end from stderr
    ...

npx remotion lambda render <serve-url> <composition-id> --out output.mp4

def generate_premiere_xml(clips, output_path, fps=30):
    # Generate FCP7 XML compatible with Premiere Pro
    ...

<project>/
├── raw/          # original footage
├── transcripts/  # SRT, word-level JSON
├── clips/        # cut segments
├── shorts/       # 9:16 vertical versions
├── b-roll/       # generated B-roll images/videos
├── motion/       # Remotion compositions
└── export/       # final deliverables + XML timelines

Video Production — Strategy Map

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

Video Production — Strategy Map

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

Visual Analysis Workflow — MANDATORY for all video analysis

Pipeline

APIs & Capabilities

Key Rules

Transcription

9:16 Shorts — Blurred Background

Silence Removal

RTL (Hebrew/Arabic) Subtitles

Remotion Motion Graphics

Premiere Pro XML Timeline

Output Structure

Check Before Writing New Scripts

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api

Video Production — Strategy Map

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

Video Production — Strategy Map

⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)

Kolbo MCP Tools (Active When kolbo auth login Is Done)

Visual Analysis Workflow — MANDATORY for all video analysis

Pipeline

APIs & Capabilities

YouTube / Social Media Download (yt-dlp)

Key Rules

Transcription

9:16 Shorts — Blurred Background

Silence Removal

RTL (Hebrew/Arabic) Subtitles

Remotion Motion Graphics

Premiere Pro XML Timeline

Output Structure

Check Before Writing New Scripts

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api

Kolbo MCP Tools (Active When `kolbo auth login` Is Done)