$33
Eight-phase workflow: reference/idea → published multimedia content package with social distribution.
REFERENCE (optional) → RESEARCH → NARRATIVE → VISUAL ASSETS → AUDIO → VIDEO → SOCIAL → DEPLOY
Before using the pipeline, the agent MUST check which tools are available and guide the user through setup for any missing ones. Run the checklist below at the start of every content creation session.
# Check what's already available
echo "=== Required ==="
which yt-dlp && echo "✓ yt-dlp" || echo "✗ yt-dlp — needed for video download"
which ffmpeg && echo "✓ ffmpeg" || echo "✗ ffmpeg — needed for video processing"
echo ""
echo "=== API Keys ==="
[ -n "$GEMINI_API_KEY" ] && echo "✓ GEMINI_API_KEY set" || echo "✗ GEMINI_API_KEY — needed for Nano Banana, Veo 3.1, Gemini analysis"
[ -n "$FAL_KEY" ] && echo "✓ FAL_KEY set" || echo "✗ FAL_KEY — optional, for fal.ai multi-provider"
[ -n "$ELEVENLABS_API_KEY" ] && echo "✓ ELEVENLABS_API_KEY set" || echo "✗ ELEVENLABS_API_KEY — optional, for voiceover"
echo ""
echo "=== TTS (Audio Narration) ==="
which kokoro-tts && echo "✓ kokoro-tts" || echo "✗ kokoro-tts — pip install kokoro-tts"
curl -sf http://localhost:17493/health > /dev/null 2>&1 && echo "✓ Voicebox server running" || echo "✗ Voicebox — optional, for premium TTS (voicebox.sh)"
which edge-tts && echo "✓ edge-tts" || echo "✗ edge-tts — pip install edge-tts (fallback TTS)"
echo ""
echo "=== Optional ==="
which agent-browser && echo "✓ agent-browser" || echo "✗ agent-browser — optional, for screenshots"
which nano-banana && echo "✓ nano-banana CLI" || echo "✗ nano-banana CLI — optional, Gemini SDK works without it"
which xurl && echo "✓ xurl" || echo "✗ xurl — optional, for X posting"
| Tool | Install | Purpose |
|---|---|---|
| yt-dlp | brew install yt-dlp | Download video from X, YouTube, any platform |
| ffmpeg | brew install ffmpeg | Video processing, format conversion, GIF creation |
| Remotion | bun add remotion @remotion/cli (per project) | Programmatic video composition |
| Tool | Setup | Purpose |
|---|---|---|
| GEMINI_API_KEY | Get free key at aistudio.google.com → API keys | Nano Banana images, Veo 3.1 video, Gemini analysis, embeddings |
| @google/genai | bun add @google/genai (per project) | SDK for all Gemini models |
How to get GEMINI_API_KEY:
export GEMINI_API_KEY="your-key" (add to .zshrc for persistence)| Tool | Setup | Purpose | When needed |
|---|---|---|---|
| FAL_KEY | Sign up at fal.ai, get key from dashboard | Multi-provider: Veo, Sora, Kling via one API | When you want to swap between video models |
| ELEVENLABS_API_KEY | Sign up at elevenlabs.io | AI voiceover generation | When video needs narration |
| kokoro-tts | pip install kokoro-tts | CLI TTS generation (82M model) | Default audio narration for posts |
| Voicebox | voicebox.sh or docker compose up | Premium local TTS with voice cloning | When best quality audio needed |
| edge-tts | pip install edge-tts | Microsoft Neural voices (free, unofficial) | Fallback when no GPU available |
| agent-browser | npm install -g @anthropic-ai/agent-browser | Screenshots, web interaction | When capturing live app screenshots |
| nano-banana CLI | npm install -g @the-focus-ai/nano-banana | Quick CLI image generation | Convenience; SDK works without it |
| xurl | brew install --cask xdevplatform/tap/xurl | Post directly to X | When publishing X threads |
| TweetSave MCP | claude mcp add -s user tweetsave -- npx -y mcp-remote https://mcp.tweetsave.org/sse | Read X posts from Claude Code | For reference extraction without yt-dlp |
| mcp-veo3 | uvx mcp-veo3 --output-dir ~/Videos/Generated | Veo 3.1 via MCP | When generating video from Claude Code |
| @aeven/nanobanana-mcp | Add to Claude MCP config | Nano Banana via MCP | When generating images from Claude Code |
When the skill is triggered, the agent should:
brew install commandagent-browser? Use FxTwitter API + yt-dlp for extractionFAL_KEY? Use @google/genai directlyxurl? Generate the post copy for manual publishingWhen the user provides a URL to a post, video, or thread as creative reference, extract and analyze it before anything else. See references/x-content-extraction.md and references/reference-based-content-creation.md for full details.
X/Twitter posts (fastest — no auth required):
# Extract tweet text, images, video URLs, engagement metrics
TWEET_ID="2034332847893574080" # from the URL path
curl -s "https://api.fxtwitter.com/status/$TWEET_ID" | jq .
# Download video directly
yt-dlp "https://x.com/user/status/$TWEET_ID" -o reference_video.mp4
# Or via FxTwitter direct download
curl -sL "https://d.fxtwitter.com/user/status/$TWEET_ID" -o reference_video.mp4
YouTube / other platforms:
yt-dlp "URL" -o reference_video.mp4
Any URL with agent-browser (screenshot + text extraction):
agent-browser open "URL" && agent-browser wait --load networkidle
agent-browser screenshot reference_screenshot.png --full
agent-browser get text body > reference_text.txt
Upload the downloaded video to Gemini for deep analysis:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const file = await ai.files.upload({ file: "reference_video.mp4", config: { mimeType: "video/mp4" } });
const analysis = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: [
{ fileData: { fileUri: file.uri, mimeType: file.mimeType } },
{ text: `Analyze this video as a content creation reference. Extract:
1. Visual Style: color palette (hex values), lighting, camera angles, framing
2. Pacing: shot durations, rhythm, fast/slow sections with timestamps
3. Transitions: types used (cuts, fades, zooms) and when
4. Text Overlays: fonts, positioning, animation, timing
5. Structure: hook (first 3s), body, CTA placement
6. Audio: music style, SFX, voiceover style
7. Engagement Hooks: techniques for retention
Return as structured JSON.` },
],
});
For text/image posts, use Gemini or Claude directly on the extracted text + screenshot to analyze hook type, structure, messaging, and CTA pattern.
The analysis produces a style brief that guides all downstream phases:
Use Gemini Embedding 2 to find similar content in your library:
// Embed a reference video into the same space as your content
const embedding = await ai.models.embedContent({
model: "gemini-embedding-2-preview",
contents: [{ inlineData: { mimeType: "video/mp4", data: videoBase64 } }],
config: { outputDimensionality: 768 },
});
// Compare with cosine similarity against your content embeddings
Supports text, images, video (up to 120s), audio (up to 80s), and PDFs in a single vector space.
Gather evidence, capture production state, pull metrics. Never write without data.
Compounding skills: /deep-research, /agent-browser, /competitor-intel, curl/API data pulls.
Checklist:
Structure using a proven framework. See references/storytelling.md for full guides.
| Content Type | Framework | Structure |
|---|---|---|
| Case study | PSI | Challenge → Solution → Quantified results |
| Industry highlight | ABT | Context AND, BUT challenge, THEREFORE outcome |
| Technical deep dive | 1-3-1 | One idea, three evidence points, one takeaway |
| Product launch | Pixar Spine | Once upon a time... Every day... Until one day... |
| Data story | Data Arc | Context → Tension → Resolution |
frontmatter (title, summary, date, published, tags)
Hook (1-2 sentences — open loop or surprising claim)
Hero media (video or key image)
Numbers section (table with headline metrics)
Problem section + image
Solution section + image + progressive detail image
Evidence section + dashboard screenshots + data visualizations
Context section + variant screenshots
Practice section + animated GIF
Generalization section
Closing (memorable one-liner)
Rules: Lead with numbers. One image per ~300 words. Bold key terms on first use. 3-4 sentence paragraphs max. Use <video> for MP4,  for images/GIFs.
Output: MDX file at apps/chat/content/writing/{slug}.mdx.
Compounding skills:
/agent-browser — production screenshots, UI workflows/pencil (MCP) — design social cards, diagrams, slides in .pen files
get_guidelines(topic) for design-system/landing-page/slides guidanceget_style_guide(tags) for visual consistencybatch_design for multi-element compositionsget_screenshot to export assets/before-and-after — visual diffs for transformation stories/frontend-design — custom visual components/arcan-glass — BroomVA brand stylingSee references/ai-video-generation.md for full API details, code examples, and Remotion integration patterns.
Image generation (Nano Banana / Gemini):
# CLI: quick hero images, social cards, diagrams
nano-banana "A hero image for {topic}, dark theme, glass effects, 1080x1080"
# SDK: @google/genai with model "gemini-3.1-flash-image" (Nano Banana 2)
# MCP: @aeven/nanobanana-mcp for Claude Code integration
Video generation (Veo 3.1):
# MCP server for Claude Code
uvx mcp-veo3 --output-dir assets/ai-clips/
# SDK: @google/genai with model "veo-3.1-generate-preview"
# Capabilities: 4K, native audio, image-to-video, frame interpolation
# Duration: 4-8s per clip, chain up to 20 extensions (~148s)
Multi-provider (fal.ai):
# Single API for Veo 3.1, Sora 2 Pro, Kling 3 Pro, 600+ models
bun add @fal-ai/client
# Swap models by changing endpoint string, no code changes
Preprocessing AI clips for Remotion (critical):
ffmpeg -i ai_clip.mp4 -c:v libx264 -crf 18 -movflags +faststart -r 30 processed.mp4
Image pipeline:
magick input.png -resize 1200x -quality 85 output-opt.png
magick f1.png f2.png f3.png -resize 1200x675! -set delay 200 -loop 0 flow.gif
mkdir -p apps/chat/public/images/writing/{slug}/
Naming: {subject}-{descriptor}-opt.png
Checklist: Hero image/video, 1 image per section (5-7 min), 1+ animated GIF, all < 500KB, descriptive alt text.
Generate TTS audio for each post so readers can listen. Pre-generate at pipeline time, not on-demand. See references/tts-audio-generation.md for full engine comparison, API details, and batch scripts.
Compounding skills: /openrocket-sim (batch scripting patterns), /remotion-best-practices (media pipeline).
| Engine | When to use |
|---|---|
| Voicebox (localhost:17493) | Best quality. Voice cloning. GPU available. POST /generate → GET /audio/{id} |
| kokoro-tts | Fast batch default. CLI-first. kokoro-tts input.txt output.wav --voice af_sarah |
| Edge TTS | No local GPU. edge-tts --text "..." --voice en-US-AndrewNeural --write-media out.mp3 |
# Strip frontmatter from MDX, generate audio
slug="my-post"
sed '1{/^---$/!q;};1,/^---$/d' apps/chat/content/writing/$slug.mdx \
| kokoro-tts - /tmp/$slug.wav --voice af_sarah
ffmpeg -i /tmp/$slug.wav -codec:a libmp3lame -b:a 128k apps/chat/public/audio/writing/$slug.mp3
public/audio/writing/{slug}.mp3audio: /audio/writing/{slug}.mp3 to post frontmatterContentArticle component renders <audio> player with full controls (play/pause, seek, skip ±10s)Checklist:
public/audio/writing/{slug}.mp3audio field added to post frontmatterCompounding skills: /remotion-best-practices — read rules for animations, sequencing, transitions, images, text. Audio from Phase 4 can be used as voiceover track in Remotion compositions.
Video structure (15-30s):
Title (3-4s) → Stats (3s) → Screenshots (2-3s each) → Workflow (3-4s) → Closing (3-4s)
Key Remotion rules: Use Img + staticFile() (never <img>). Use spring() for organic motion. Use Sequence with premountFor. No CSS transitions or Tailwind animation classes.
Combine AI-generated footage with Remotion motion graphics for production-quality output:
Nano Banana → hero images, backgrounds, social cards
Veo 3.1 → cinematic B-roll clips (8s each, 4K, with audio)
Remotion → motion graphics, titles, transitions, data viz
FFmpeg → preprocess AI clips, final GIF conversion
In Remotion compositions:
// AI-generated video as background layer
<OffthreadVideo src={staticFile("assets/veo-clip.mp4")} style={{ objectFit: "cover" }} />
// AI-generated image
<Img src={staticFile("assets/nano-banana-hero.png")} />
// Dynamic duration from AI clips
// Use @remotion/media-parser parseMedia() with calculateMetadata
Use <TransitionSeries> from @remotion/transitions to blend AI clips with motion graphics scenes via fade/wipe/slide transitions.
cd /tmp/{project}-remotion && bun install
npx remotion render {Id} --output out/video.mp4
ffmpeg -y -i out/video.mp4 -vf "fps=12,scale=960:-1:flags=lanczos" -c:v gif out/video.gif
See references/social-distribution.md for copy patterns and references/social-publishing.md for CLI/MCP tool setup.
Publishing: Use xurl CLI or Twitter MCP server to post directly.
xurl post "1/7 — [Hook tweet text]"
xurl media upload hero-image.png # returns MEDIA_ID
xurl post "2/7 — [Context]" --media-id MEDIA_ID
Use /pencil to design slides. Cover → Problem → Insights (1/slide) → Stat → Summary → CTA.
Publishing: Use Instagram MCP server (ig-mcp) or Meta Graph API.
Hook in first 210 chars. 2-3 paragraphs + bullet list + CTA. 3-5 hashtags.
Publishing: Use LinkedIn MCP server (linkedin-mcp) or REST API with OAuth token.
git checkout -b content/{slug}
git add apps/chat/content/writing/{slug}.mdx apps/chat/public/images/writing/{slug}/
git commit -m "content: add {title}"
git push -u origin content/{slug}
gh pr create --title "content: {short title}" --body "..."
┌─ REFERENCE EXTRACTION ──────────────────────────────────────────┐
│ FxTwitter API (no auth) yt-dlp /agent-browser │
│ TweetSave MCP Gemini 2.5 (video understanding) │
│ Gemini Embedding 2 (multimodal similarity) │
├─ RESEARCH ──────────────────────────────────────────────────────┤
│ /deep-research /agent-browser /competitor-intel curl │
├─ AI GENERATION ─────────────────────────────────────────────────┤
│ Nano Banana (@google/genai) Veo 3.1 (@google/genai) │
│ fal.ai (@fal-ai/client) ElevenLabs (voiceover) │
│ nano-banana CLI mcp-veo3 (MCP server) │
│ @aeven/nanobanana-mcp veo-mcp-server │
├─ AUDIO (TTS) ──────────────────────────────────────────────────┤
│ Voicebox (localhost:17493) kokoro-tts CLI edge-tts │
│ mlx-audio (Apple Silicon) ffmpeg (WAV→MP3) │
├─ DESIGN ────────────────────────────────────────────────────────┤
│ /pencil (MCP) /before-and-after /frontend-design │
│ /arcan-glass magick/ffmpeg │
├─ VIDEO ─────────────────────────────────────────────────────────┤
│ /remotion-best-practices /skills-showcase /json-render-remotion │
│ @remotion/media-parser @remotion/transitions │
├─ NARRATIVE ─────────────────────────────────────────────────────┤
│ references/storytelling.md references/social-distribution.md │
│ references/visual-content.md │
├─ PUBLISH ───────────────────────────────────────────────────────┤
│ xurl (X CLI) twitter-mcp-server linkedin-mcp │
│ ig-mcp Ayrshare MCP (multi-platform) │
├─ DEPLOY ────────────────────────────────────────────────────────┤
│ git + gh CLI /vercel-cli Vercel preview CI/CD │
└─────────────────────────────────────────────────────────────────┘