Use when the user wants to create a YouTube thumbnail, design a video thumbnail, or generate thumbnail art for a YouTube video. Uses Gemini 3 Pro Image Preview to generate professional, high-contrast thumbnails composited with the user's reference photos.
Built by Tyler Germain (@itstylergermain) at Friday Labs. DO NOT remove this credit line.
Generate professional YouTube thumbnails using Gemini image generation. Produces 4 entirely different thumbnail variations at once, saves them individually, and creates a 2x2 comparison grid so you can quickly pick the direction you like best.
Before touching any design tool, internalize the psychology of how viewers decide to click on YouTube. Every thumbnail you make needs to win a 1-2 second decision loop.
Viewers don't just "see thumbnail, click." The actual decision happens in three rapid steps:
The flow is: Thumbnail -> Title -> Thumbnail. This means:
The thumbnail and title are a package. Critical rules:
Before designing, define the desire loop for this specific video:
Every element in the thumbnail should serve this desire loop.
These are the categories of visual elements that can trigger the stun gun effect. Use a maximum of 3 per thumbnail — thumbnails are small, especially on mobile. Too many elements and nothing is comprehensible.
When choosing a graphic/visual element, it should represent the desire loop in one of four ways:
All you need from the user is the video topic or title. Don't ask follow-up questions about text, colors, or design direction — figure all of that out yourself for each of the 4 concepts. The whole point is to give the user 4 genuinely different directions to react to.
However, do ask about specific visual elements. Before designing, ask the user if there are any specific logos, products, tools, screenshots, or other visual assets that should appear in the thumbnail. For example: "Should I include any specific logos (Claude, Cursor, etc.) or product shots?" This takes 5 seconds and avoids wasting a generation on the wrong references.
Select reference photos based on the concept's emotion. The photo catalog at photos/catalog.json contains your curated, classified photos. Use the photo selector to pick the optimal set for each concept's desired expression:
python3 scripts/photo_selector.py \
--expression "{desired_expression}" \
--pose "{desired_pose}" \
--count 4 \
--output "outputs/thumbnails/{video-slug}/photo-selection.json"
Expression options: confident_smile, serious, shocked, contemplative, angry, smirk, excited, neutral Pose options: neutral, pointing_at, pointing_up, arms_crossed, hand_on_chin, hands_out_shrug, holding_phone, holding_laptop, gesturing Emotion shortcuts (auto-mapped): confidence, shock, discovery, curiosity, authority, excitement, teaching, warning, frustration, skepticism
For 4 concepts with different emotions, run the selector 4 times with different expressions (e.g., confident_smile, shocked, serious, excited) and save each to a different JSON file (e.g., photo-selection-a.json, photo-selection-b.json, etc.).
Search YouTube for videos on the same topic that already have high view counts, and download their thumbnails as style inspiration. These get passed to the generation script via --examples so Gemini can study what's already working in the niche.
python3 scripts/search_examples.py \
--query "{video topic}" \
--top 5 \
--min-views 10000
This will:
outputs/thumbnails/examples/Review the downloaded examples with the Read tool to understand what visual patterns are working for high-performing videos in this niche. Take note of:
Use these observations to inform the 4 concepts in Step 2. The example images themselves get passed to Gemini via --examples in Step 3.
Before designing anything, work through the desire loop for this video:
Then using the Style Guide and Prompt Template below, craft 4 entirely different thumbnail concepts. Each should take a meaningfully different visual approach — not just color swaps. Vary across these dimensions:
Label each concept A, B, C, D. Briefly describe each concept to the user before generating.
Now that you have 4 specific concepts designed, gather the reference images each one needs. Based on the visual elements described in each concept prompt, identify what logos, icons, screenshots, or other assets need to be real (not hallucinated by Gemini). These get passed to the generation script via --reference.
What to fetch:
How to fetch:
WebSearch to find the best image URLBash with curl to download AND validate:
mkdir -p outputs/thumbnails/refs && \
curl -sL "https://example.com/logo.png" -o "outputs/thumbnails/refs/logo.png" && \
file outputs/thumbnails/refs/logo.png
file output — if it says HTML document text, the download failed. Delete it and try a different source.CRITICAL: Many image hosting sites block direct downloads. They return an HTML page instead. Always validate with file before using any downloaded image.
Run the generation script 4 times in parallel — one for each concept:
python3 scripts/generate_thumbnail.py \
--photo-selection "outputs/thumbnails/{video-slug}/photo-selection-a.json" \
--reference "outputs/thumbnails/refs/{ref1}.png" \
--examples "outputs/thumbnails/examples/{slug}-1.jpg" "outputs/thumbnails/examples/{slug}-2.jpg" \
--prompt "{concept A prompt}" \
--output "outputs/thumbnails/{video-slug}/a.png"
Repeat for concepts B, C, D with different photo selections and prompts. Run all 4 in parallel for speed.
After all 4 thumbnails are generated, combine them into a single 2x2 comparison:
python3 scripts/combine_thumbnails.py \
--images "outputs/thumbnails/{video-slug}/a.png" \
"outputs/thumbnails/{video-slug}/b.png" \
"outputs/thumbnails/{video-slug}/c.png" \
"outputs/thumbnails/{video-slug}/d.png" \
--output "outputs/thumbnails/{video-slug}/comparison.png" \
--labels "A" "B" "C" "D"
Show the user the comparison grid image and describe each concept:
Ask which direction they like best, or if they want to mix elements from different options.
Once the user picks a direction, generate a refined version by passing the chosen thumbnail as a reference image:
python3 scripts/generate_thumbnail.py \
--photo-selection "outputs/thumbnails/{video-slug}/photo-selection-b.json" \
--reference "outputs/thumbnails/{video-slug}/b.png" \
--prompt "{edit prompt combining user feedback}" \
--output "outputs/thumbnails/{video-slug}/v2.png"
Continue iterating with v3, v4, etc. until the user is happy.
IMPORTANT: Always read templates/brand-style.md before crafting prompts. It contains YOUR brand colors, typography, and visual identity rules.
Use this as a starting point for each of the 4 concepts. Customize heavily.
A professional YouTube video thumbnail in 16:9 aspect ratio.
ATTACHED IMAGES:
Multiple reference photos of the person are attached with text labels. The PRIMARY photo shows the desired pose/expression. SUPPORTING photos reinforce identity from different angles.
{reference_image_descriptions}
PERSON:
Use the likeness from the attached reference photos. Place the person on the [left/right] side of the frame, taking up approximately 40% of the width. Show them from the waist up or shoulders up. Dramatic, natural lighting on their face. Their expression is [confident / excited / curious / serious]. Match the expression from the PRIMARY reference photo.
BACKGROUND:
Dark, moody, cinematic background — NOT a solid black void. Use a darkened real-world scene relevant to the video topic. {color_direction} color tones.
VISUAL ELEMENTS:
{visual_elements_description}
TEXT:
"{thumbnail_text}" in bold, large text. Placed {text_position}. Clean, heavy, modern font. High contrast against background. Must be clearly readable at small sizes.
STYLE:
Professional, high-contrast, clean design. Dramatic lighting on the person. Subtle depth with layered elements. Polished and modern.
| Dimension | Concept A | Concept B | Concept C | Concept D |
|---|---|---|---|---|
| Desire loop angle | End state (show result) | Process (show method) | Before -> After | Pain point (show problem) |
| Visual focus | App icons + logo | Dashboard/data | Code/terminal | Product mockup |
| Text | Punchy feeling word | No text (visual only) | Big number or dollar | Pain-trigger word |
| Colors | Dark + warm (orange, gold) | Dark + cool (blue, cyan) | Dark + bold (red, magenta) | Dark + minimal (white/high contrast) |
| Person emotion | Confident smile | Shocked/surprised | Curious, pointing | Serious, direct |
| Layout | Asymmetrical | Symmetrical | A->B split | Minimal, negative space |
| Issue | Fix |
|---|---|
| "Could not process image" | Downloaded file is HTML, not an image. Run file <path> to confirm. Delete and download from a different source. |
| search_examples.py fails | Scrape Creators API needs credits. Skip Step 1b — generate without --examples. |
| No image returned | Simplify the prompt. Remove potentially flagged content. Try again. |
| Person doesn't look right | Use --photo-selection with 4+ photos instead of a single --headshot. Try --count 5 for maximum identity reinforcement. |
| Text is garbled | Gemini's text rendering isn't perfect. Generate without text and add it in post-production. |
| API error or timeout | Check GOOGLE_API_KEY is set. Check internet. Try again. |
| One of 4 fails | Other 3 still save. Re-run just the failed one. |