Eight-phase workflow: reference/idea → published multimedia content package with social distribution.

REFERENCE (optional) → RESEARCH → NARRATIVE → VISUAL ASSETS → AUDIO → VIDEO → SOCIAL → DEPLOY

Setup & Onboarding

Before using the pipeline, the agent MUST check which tools are available and guide the user through setup for any missing ones. Run the checklist below at the start of every content creation session.

Prerequisite Check (run this first)

# Check what's already available
echo "=== Required ==="
which yt-dlp && echo "✓ yt-dlp" || echo "✗ yt-dlp — needed for video download"
which ffmpeg && echo "✓ ffmpeg" || echo "✗ ffmpeg — needed for video processing"
echo ""
echo "=== API Keys ==="
[ -n "$GEMINI_API_KEY" ] && echo "✓ GEMINI_API_KEY set" || echo "✗ GEMINI_API_KEY — needed for Nano Banana, Veo 3.1, Gemini analysis"
[ -n "$FAL_KEY" ] && echo "✓ FAL_KEY set" || echo "✗ FAL_KEY — optional, for fal.ai multi-provider"
[ -n "$ELEVENLABS_API_KEY" ] && echo "✓ ELEVENLABS_API_KEY set" || echo "✗ ELEVENLABS_API_KEY — optional, for voiceover"
echo ""
echo "=== TTS (Audio Narration) ==="
which kokoro-tts && echo "✓ kokoro-tts" || echo "✗ kokoro-tts — pip install kokoro-tts"
curl -sf http://localhost:17493/health > /dev/null 2>&1 && echo "✓ Voicebox server running" || echo "✗ Voicebox — optional, for premium TTS (voicebox.sh)"
which edge-tts && echo "✓ edge-tts" || echo "✗ edge-tts — pip install edge-tts (fallback TTS)"
echo ""
echo "=== Optional ==="
which agent-browser && echo "✓ agent-browser" || echo "✗ agent-browser — optional, for screenshots"
which nano-banana && echo "✓ nano-banana CLI" || echo "✗ nano-banana CLI — optional, Gemini SDK works without it"
which xurl && echo "✓ xurl" || echo "✗ xurl — optional, for X posting"

Setup & Onboarding

Prerequisite Check (run this first)

# Check what's already available echo "=== Required ===" which yt-dlp && echo "✓ yt-dlp" || echo "✗ yt-dlp — needed for video download" which ffmpeg && echo "✓ ffmpeg" || echo "✗ ffmpeg — needed for video processing" echo "" echo "=== API Keys ===" [ -n "$GEMINI_API_KEY" ] && echo "✓ GEMINI_API_KEY set" || echo "✗ GEMINI_API_KEY — needed for Nano Banana, Veo 3.1, Gemini analysis" [ -n "$FAL_KEY" ] && echo "✓ FAL_KEY set" || echo "✗ FAL_KEY — optional, for fal.ai multi-provider" [ -n "$ELEVENLABS_API_KEY" ] && echo "✓ ELEVENLABS_API_KEY set" || echo "✗ ELEVENLABS_API_KEY — optional, for voiceover" echo "" echo "=== TTS (Audio Narration) ===" which kokoro-tts && echo "✓ kokoro-tts" || echo "✗ kokoro-tts — pip install kokoro-tts" curl -sf http://localhost:17493/health > /dev/null 2>&1 && echo "✓ Voicebox server running" || echo "✗ Voicebox — optional, for premium TTS (voicebox.sh)" which edge-tts && echo "✓ edge-tts" || echo "✗ edge-tts — pip install edge-tts (fallback TTS)" echo "" echo "=== Optional ===" which agent-browser && echo "✓ agent-browser" || echo "✗ agent-browser — optional, for screenshots" which nano-banana && echo "✓ nano-banana CLI" || echo "✗ nano-banana CLI — optional, Gemini SDK works without it" which xurl && echo "✓ xurl" || echo "✗ xurl — optional, for X posting"

┌─ REFERENCE EXTRACTION ──────────────────────────────────────────┐ │ FxTwitter API (no auth) yt-dlp /agent-browser │ │ TweetSave MCP Gemini 2.5 (video understanding) │ │ Gemini Embedding 2 (multimodal similarity) │ ├─ RESEARCH ──────────────────────────────────────────────────────┤ │ /deep-research /agent-browser /competitor-intel curl │ ├─ AI GENERATION ─────────────────────────────────────────────────┤ │ Nano Banana (@google/genai) Veo 3.1 (@google/genai) │ │ fal.ai (@fal-ai/client) ElevenLabs (voiceover) │ │ nano-banana CLI mcp-veo3 (MCP server) │ │ @aeven/nanobanana-mcp veo-mcp-server │ ├─ AUDIO (TTS) ──────────────────────────────────────────────────┤ │ Voicebox (localhost:17493) kokoro-tts CLI edge-tts │ │ mlx-audio (Apple Silicon) ffmpeg (WAV→MP3) │ ├─ DESIGN ────────────────────────────────────────────────────────┤ │ /pencil (MCP) /before-and-after /frontend-design │ │ /arcan-glass magick/ffmpeg │ ├─ VIDEO ─────────────────────────────────────────────────────────┤ │ /remotion-best-practices /skills-showcase /json-render-remotion │ │ @remotion/media-parser @remotion/transitions │ ├─ NARRATIVE ─────────────────────────────────────────────────────┤ │ references/storytelling.md references/social-distribution.md │ │ references/visual-content.md │ ├─ PUBLISH ───────────────────────────────────────────────────────┤ │ xurl (X CLI) twitter-mcp-server linkedin-mcp │ │ ig-mcp Ayrshare MCP (multi-platform) │ ├─ DEPLOY ────────────────────────────────────────────────────────┤ │ git + gh CLI /vercel-cli Vercel preview CI/CD │ └─────────────────────────────────────────────────────────────────┘

Tool	Install	Purpose
yt-dlp	`brew install yt-dlp`	Download video from X, YouTube, any platform
ffmpeg	`brew install ffmpeg`	Video processing, format conversion, GIF creation
Remotion	`bun add remotion @remotion/cli` (per project)	Programmatic video composition

Tool	Setup	Purpose
GEMINI_API_KEY	Get free key at aistudio.google.com → API keys	Nano Banana images, Veo 3.1 video, Gemini analysis, embeddings
@google/genai	`bun add @google/genai` (per project)	SDK for all Gemini models

Tool	Setup	Purpose	When needed
FAL_KEY	Sign up at fal.ai, get key from dashboard	Multi-provider: Veo, Sora, Kling via one API	When you want to swap between video models
ELEVENLABS_API_KEY	Sign up at elevenlabs.io	AI voiceover generation	When video needs narration
kokoro-tts	`pip install kokoro-tts`	CLI TTS generation (82M model)	Default audio narration for posts
Voicebox	voicebox.sh or `docker compose up`	Premium local TTS with voice cloning	When best quality audio needed
edge-tts	`pip install edge-tts`	Microsoft Neural voices (free, unofficial)	Fallback when no GPU available
agent-browser	`npm install -g @anthropic-ai/agent-browser`	Screenshots, web interaction	When capturing live app screenshots
nano-banana CLI	`npm install -g @the-focus-ai/nano-banana`	Quick CLI image generation	Convenience; SDK works without it
xurl	`brew install --cask xdevplatform/tap/xurl`	Post directly to X	When publishing X threads
TweetSave MCP	`claude mcp add -s user tweetsave -- npx -y mcp-remote https://mcp.tweetsave.org/sse`	Read X posts from Claude Code	For reference extraction without yt-dlp
mcp-veo3	`uvx mcp-veo3 --output-dir ~/Videos/Generated`	Veo 3.1 via MCP	When generating video from Claude Code
@aeven/nanobanana-mcp	Add to Claude MCP config	Nano Banana via MCP	When generating images from Claude Code

Content Type	Framework	Structure
Case study	PSI	Challenge → Solution → Quantified results
Industry highlight	ABT	Context AND, BUT challenge, THEREFORE outcome
Technical deep dive	1-3-1	One idea, three evidence points, one takeaway
Product launch	Pixar Spine	Once upon a time... Every day... Until one day...
Data story	Data Arc	Context → Tension → Resolution

Engine	When to use
Voicebox (localhost:17493)	Best quality. Voice cloning. GPU available. `POST /generate` → `GET /audio/{id}`
kokoro-tts	Fast batch default. CLI-first. `kokoro-tts input.txt output.wav --voice af_sarah`
Edge TTS	No local GPU. `edge-tts --text "..." --voice en-US-AndrewNeural --write-media out.mp3`

Content Creation Pipeline

Setup & Onboarding

Prerequisite Check (run this first)

Content Creation Pipeline

Setup & Onboarding

Prerequisite Check (run this first)

Tier 1: Core (required for any content creation)

Tier 2: AI Generation (required for AI-powered assets)

Tier 3: Enhanced (optional, for specific features)

Agent Behavior

Phase 0: Reference Extraction (when user provides a link)

Step 1: Extract content from the link

Step 2: Analyze with Gemini (video understanding + style extraction)

Step 3: Generate a style brief

Multimodal Embedding (for similarity search)

Phase 1: Research

Phase 2: Narrative

Blog Post Structure

Phase 3: Visual Assets

AI-Generated Assets

Manual Asset Pipeline

Phase 4: Audio Narration

TTS Engine Selection

Quick Generation

Integration

Phase 5: Video

Hybrid AI + Remotion Pipeline

Render Commands

X Thread (5-8 tweets)

Instagram Carousel (8-12 slides, 1080x1350px)

LinkedIn Post

Phase 7: Deploy

Dependency Map

Reference Files

Article Writing

Article Writing

Content Engine

Brand Voice

Article Writing

Article Writing

Content Creation Pipeline

Setup & Onboarding

Prerequisite Check (run this first)

Content Creation Pipeline

Setup & Onboarding

Prerequisite Check (run this first)

Tier 1: Core (required for any content creation)

Tier 2: AI Generation (required for AI-powered assets)

Tier 3: Enhanced (optional, for specific features)

Agent Behavior

Phase 0: Reference Extraction (when user provides a link)

Step 1: Extract content from the link

Step 2: Analyze with Gemini (video understanding + style extraction)

Step 3: Generate a style brief

Multimodal Embedding (for similarity search)

Phase 1: Research

Phase 2: Narrative

Blog Post Structure

Phase 3: Visual Assets

AI-Generated Assets

Manual Asset Pipeline

Phase 4: Audio Narration

TTS Engine Selection

Quick Generation

Integration

Phase 5: Video

Hybrid AI + Remotion Pipeline

Render Commands

Phase 6: Social Distribution

X Thread (5-8 tweets)

Instagram Carousel (8-12 slides, 1080x1350px)

LinkedIn Post

Phase 7: Deploy

Dependency Map

Reference Files

Article Writing

Article Writing

Content Engine

Brand Voice

Article Writing

Article Writing