Generate and edit images using Google's Gemini API (Nano Banana). Supports conversational image generation - iterative refinement through dialogue. Use when users want to create images, edit existing images, refine generations, or mention Gemini, Imagen, Nano Banana, or image generation.
Conversational image generation using Google's Gemini API.
Philosophy: Image generation works best as a dialogue, not a one-shot prompt. Nano Banana enables iterative refinement where the model "remembers" previous generations.
.env file as gemini_api = YOUR_KEYIMPORTANT: Follow this 4-step workflow unless the user explicitly bypasses it (see Bypass Mode).
When a user makes a request, identify which capability matches. See capabilities.md for the full reference.
| Images | Likely Capabilities |
|---|---|
| 0 | Text-to-Image, Text Rendering |
| 1 | Image Editing, Object Removal/Addition, Background Replacement, Outpainting |
| 2+ | Face Swap, Pose Transfer, Style Transfer |
Always ask clarifying questions, even if the request seems clear. Each capability has specific required information.
Questions should be conversational, not form-like. Examples:
Reference capabilities.md for capability-specific questions.
For multi-image operations, default to preserving the target/base image's aspect ratio.
Default behavior by capability:
| Capability | Default Aspect Source |
|---|---|
| Face Swap | Body/target image |
| Pose Transfer | Pose reference image |
| Style Transfer | Content image |
| Background Replacement | Subject image |
Workflow:
Auto-detect dimensions of the appropriate source image:
file "/path/to/body_image.png"
# Returns: ... 1080x799 ...
Calculate ratio (width ÷ height) and map to closest supported:
| Calculated | Closest | Flag |
|---|---|---|
| ~1.0 | Square | --aspect 1:1 |
| ~1.33 | Landscape | --aspect 4:3 |
| ~0.75 | Portrait | --aspect 3:4 |
| ~1.78 | Widescreen | --aspect 16:9 |
| ~0.56 | Vertical | --aspect 9:16 |
| ~2.33 | Ultrawide | --aspect 21:9 |
Include in structured breakdown - show the detected ratio and confirm with user.
Example for face swap:
Body image: astronaut.jpg (1920x1080) → 16:9 widescreen
→ Using --aspect 16:9 to match body image dimensions
User can override with:
After gathering information, present a structured breakdown for approval:
## Request Analysis
**Capability**: [detected capability]
### Images Identified
| Role | File | Purpose |
|------|------|---------|
| [role] | `[filename]` | [what it provides] |
### Generated Prompt
[The full prompt that will be sent to the API]
### Parameters
- Model: `[model name]`
- Aspect: [ratio]
- [other relevant params]
### Command Preview
python3 .claude/skills/gemini/scripts/generate_image.py "[prompt]" [flags]
---
**Ready to generate?** Reply "yes" to proceed, or tell me what to change.
If user says "yes": Execute the command.
If user requests changes: Update the breakdown and present again. Common refinements:
Continue refining until the user approves.
Users can skip the structured approval workflow by including bypass phrases in their request.
If the user's message includes any of these (case-insensitive), execute directly without the structured breakdown:
When bypassed:
Bypassed: "Just generate a cute robot holding a coffee cup" → Execute directly, show result, offer refinements
Not bypassed: "Generate a robot" (ambiguous, needs clarification) → Follow normal workflow, ask about style/details
Bypassed: "Quick headshot of this photo, skip approval" + image → Execute with sensible defaults (Pro model for headshots, appropriate aspect)
User: "Do a face swap with these two photos"
Claude: "I can help with that face swap! A few quick questions:
User: "First image is my face, second is the astronaut. No accessories."
Claude:
## Request Analysis
**Capability**: Face Swap
### Images Identified
| Role | File | Purpose |
|------|------|---------|
| Face source | `photo1.jpg` | Face to transfer |
| Body target | `astronaut.jpg` | Body/pose to use |
### Generated Prompt
{
"task": "Transfer face from Image 1 onto astronaut body in Image 2",
"from_image_1": {
"extract": ["face", "facial features", "expression", "skin tone"]
},
"from_image_2": {
"keep": ["astronaut suit", "pose", "space background", "helmet framing"]
},
"integration": {
"neck_blend": "seamless",
"lighting": "match harsh space lighting from Image 2"
}
}
### Parameters
- Model: `gemini-3-pro-image-preview` (recommended for face swaps)
- Aspect: 1:1
### Command Preview
python3 .claude/skills/gemini/scripts/generate_image.py '{...}' --image photo1.jpg --image astronaut.jpg --model gemini-3-pro-image-preview
---
**Ready to generate?** Reply "yes" to proceed, or tell me what to change.
User: "Add more thinking budget"
Claude: Updates breakdown with --thinking 16384 and presents again
User: "yes"
Claude: Executes the command
# Generate new image
python3 .claude/skills/gemini/scripts/generate_image.py "a cute robot"
# Edit existing image (conversational)
python3 .claude/skills/gemini/scripts/generate_image.py "add a red hat" --image ./generated_images/robot.png
# High quality with Pro model
python3 .claude/skills/gemini/scripts/generate_image.py "detailed portrait" --model gemini-3-pro-image-preview --thinking 16384
This is the core workflow. Treat image generation as a dialogue with a creative partner.
Note: All generations must go through the Interactive Workflow above. The steps below describe the conversation after the user has approved execution.
When user asks to modify a recent generation:
# User: "Make it brighter"
python3 .claude/skills/gemini/scripts/generate_image.py "make the image brighter with more sunlight" \
--image ./generated_images/previous_image.png
# User: "Add a coffee cup"
python3 .claude/skills/gemini/scripts/generate_image.py "add a steaming coffee cup to the scene" \
--image ./generated_images/previous_image.png
Example conversation flow:
User: Generate a robot
Claude: [generates robot, saves to generated_images/gemini_image_20251231_xxx.png]
Here's your robot! Would you like any changes?
User: Make it hold a coffee cup
Claude: [uses --image flag with previous image]
python3 ... "add a coffee cup to the robot's hands" --image ./generated_images/gemini_image_20251231_xxx.png
Done! The robot now has a coffee cup. Anything else?
User: Change background to sunrise
Claude: [uses --image with the updated image]
Updated with a sunrise background!
| User Says | Mode | Command |
|---|---|---|
| "Generate a cat" | New | python3 ... "a cat" |
| "Create an image of..." | New | python3 ... "..." |
| "Make it brighter" | Edit | python3 ... "brighter" --image <previous> |
| "Add a hat" | Edit | python3 ... "add hat" --image <previous> |
| "Change the background" | Edit | python3 ... "change bg" --image <previous> |
| "Try again" / "New version" | New | python3 ... "<same prompt>" |
| "Make A look like B" | Reference | python3 ... "..." --image A.png --image B.png |
Pass multiple images with --image flags and describe the relationship in the prompt:
# Pose/expression transfer
python3 .claude/skills/gemini/scripts/generate_image.py \
"Make the person in the first image adopt the pose and expression of the person in the second image" \
--image subject.png --image reference.png
# Style transfer
python3 .claude/skills/gemini/scripts/generate_image.py \
"Apply the artistic style of the second image to the first image" \
--image photo.png --image artwork.png
# Identity swap
python3 .claude/skills/gemini/scripts/generate_image.py \
"Put person A's face on person B's body" \
--image personA.png --image personB.png
Tips for reference mode:
--model gemini-3-pro-image-preview) works better for complex transfersFor comprehensive face swap techniques (4 swap types, 6 failure fixes, source image selection), see examples/advanced-techniques.md#face--identity-swaps.
Quick reference for common applications. See examples/prompts.md for full templates.
| Category | Examples | Best Model |
|---|---|---|
| Photorealism | Headshots, film aesthetics, era-specific portraits | Pro |
| E-commerce | Product shots, virtual try-on, lifestyle photography | Pro |
| Social Media | Thumbnails, viral covers, meme generation | Flash → Pro |
| Interior Design | Floor plan → renders, room visualization | Pro |
| Education | Infographics, memory palace, concept visualization | Flash |
| Photo Editing | Outpainting, crowd removal, background replacement | Pro |
| Creative | Recursive images, aging effects, Droste effect | Pro |
| Workplace | Whiteboard → flowchart, UI sketch → prototype | Flash → Pro |
| Face Swaps | Face-on-body, identity transfer, head swap | Pro |
| Translation | Sign translation, comic localization | Pro |
| Avatars | 3D blind box, pet memes, stylized portraits | Flash |
Reference: See config.md for current model IDs, rate limits, and parameters.
Choose based on use case:
| Use Case | Model | Why |
|---|---|---|
| Rapid iterations, drafts | gemini-2.5-flash-image | Fast (2-5s), lower cost |
| Final output, quality | gemini-3-pro-image-preview | Superior quality, 2K |
| Text-heavy images | gemini-3-pro-image-preview | Best typography |
| Complex compositions | gemini-3-pro-image-preview | Better reasoning with --thinking |
| High volume | gemini-2.5-flash-image | Lower cost, faster |
Default: gemini-2.5-flash-image (Nano Banana) - good for most cases
Choose aspect ratio based on intended use:
| Ratio | Resolution | Best For | Token Cost |
|---|---|---|---|
1:1 | 1024x1024 | Icons, Instagram, squares | Lowest |
16:9 | 1344x768 | YouTube thumbnails, widescreen | Medium |
9:16 | 768x1344 | TikTok, Reels, Stories | Medium |
4:3 | 1184x864 | Presentations | Medium |
3:4 | 864x1184 | Portraits | Medium |
21:9 | 1536x672 | Cinematic, ultra-wide | Higher |
Tip: Use 1:1 for lowest token cost; use 21:9 for cinematic shots.
The --thinking parameter controls reasoning depth for gemini-3-pro-image-preview:
| Budget | Use Case | When to Use |
|---|---|---|
4096 | Quick | Simple prompts, fast iterations |
8192 | Balanced | Default, most use cases |
16384 | Complex | Detailed compositions, multiple subjects |
32768 | Maximum | Challenging requests, precise text rendering |
# Complex scene with maximum thinking
python3 .claude/skills/gemini/scripts/generate_image.py \
"A detailed medieval marketplace with merchants, customers, and goods" \
--model gemini-3-pro-image-preview \
--thinking 16384
Free Tier:
| Model | Requests/Day (RPD) | Requests/Min (RPM) |
|---|---|---|
gemini-2.5-flash-image | ~100 | ~15 |
gemini-3-pro-image-preview | ~10 | ~5-10 |
Paid Tier (varies by spend):
Cost Estimates:
Cost Optimization:
1:1 aspect ratio when possible (lowest tokens)| Parameter | Values | Default | Description |
|---|---|---|---|
--model | gemini-2.5-flash-image, gemini-3-pro-image-preview | gemini-2.5-flash-image | Model |
--image | file path or URL | none | Image(s) for editing/reference. Supports local paths and URLs. Use multiple times. |
--aspect | 1:1, 16:9, 9:16, 4:3, 3:4, 21:9 | 1:1 | Aspect ratio |
--count | 1-4 | 1 | Number of images |
--output | directory | ./generated_images | Output base directory |
--project | string | none | Project subfolder. Creates: output/YYYY-MM-DD/project/ |
--name | string | gemini_image | Filename prefix |
--retries | number | 3 | Rate limit retry attempts |
--thinking | 4096-32768 | 8192 | Thinking budget (Pro only) |
The script returns JSON:
{
"success": true,
"prompt": "the prompt used",
"model": "gemini-2.5-flash-image",
"mode": "generate | edit | reference",
"source_images": [],
"aspect_ratio": "1:1",
"thinking_budget": null | 8192,
"images": ["path/to/output.png"],
"text_response": "optional model commentary",
"count": 1
}
Mode values:
generate - No input images (0)edit - Single input image (1)reference - Multiple input images (2+)Always offer to enhance basic prompts:
| Original | Enhanced |
|---|---|
| "a cat" | "A fluffy orange tabby cat lounging on a velvet cushion, soft window light, shallow depth of field, professional pet photography, warm tones" |
| "robot" | "A friendly humanoid robot with glowing blue eyes, sleek white and silver design, soft studio lighting, futuristic but approachable, 8K detail" |
Enhancement techniques:
For complex generations, use JSON to organize your prompt. This improves consistency and makes prompts reusable:
{
"subject": {
"description": "A young professional woman",
"age": "early 30s",
"expression": "confident, approachable",
"clothing": "navy business suit, white shirt"
},
"photography": {
"style": "professional headshot",
"camera": "Sony A7III, 85mm f/1.4",
"lighting": "three-point studio lighting with soft key light"
},
"background": {
"setting": "solid dark gray studio backdrop",
"effects": "subtle vignette, slightly lighter behind subject"
},
"quality": "8K, natural skin texture with visible pores, catchlights in eyes"
}
When to use JSON:
When plain text works fine:
Critical for transformations that should maintain identity:
{
"face": {
"preserve_original": true,
"instruction": "Keep facial features 100% accurate from reference image"
},
"changes": {
"clothing": "professional navy suit with white shirt",
"background": "clean studio backdrop",
"lighting": "soft professional lighting"
},
"do_not_modify": ["facial features", "expression", "face shape"]
}
Reference specific time periods for authentic looks:
| Era | Key Elements |
|---|---|
| 1990s flash | Harsh direct flash, slight overexposure, party vibes, disposable camera look |
| 2000s digital | Early digital artifacts, slight noise, MySpace aesthetic, harsh flash |
| Kodak Portra 400 | Warm skin tones, soft grain, nostalgic, natural colors |
| Fuji Superia | Cooler tones, punchy colors, subtle green cast |
Expand images to different aspect ratios while maintaining visual coherence:
python3 .claude/skills/gemini/scripts/generate_image.py \
"Expand to 16:9 aspect ratio. Seamlessly extend the scenery on both left and right sides. Match the original lighting, weather, and texture perfectly. Complete any cut-off objects naturally." \
--image ./cropped_photo.png \
--aspect 16:9
Remove unwanted elements and fill with contextually appropriate content:
python3 .claude/skills/gemini/scripts/generate_image.py \
"Remove all tourists/people in the background behind the main subject. Replace them with realistic background elements that fit the scene. Ensure no blur artifacts remain." \
--image ./crowded_photo.png
Create images containing themselves infinitely:
python3 .claude/skills/gemini/scripts/generate_image.py \
"Recursive image of an orange cat sitting in an office chair holding an iPad. On the iPad is the same cat in the same scene holding the same iPad. Repeated on each iPad." \
--model gemini-3-pro-image-preview --thinking 16384
Dress a model in specific garments while preserving fabric details:
python3 .claude/skills/gemini/scripts/generate_image.py \
'{
"task": "Using the garment from Image 1 and the model from Image 2, create a realistic full-body fashion photo",
"fit_details": "Garment must drape naturally, creating realistic folds and wrinkles",
"preservation": "Preserve original fabric texture, color, and logos with extreme accuracy",
"integration": "Match ambient lighting, color temperature, and shadow direction",
"style": "Clean e-commerce lookbook, Canon EOS R5, 50mm f/1.8"
}' \
--image garment.png --image model.png \
--model gemini-3-pro-image-preview
Convert hand-drawn sketches to polished business graphics:
python3 .claude/skills/gemini/scripts/generate_image.py \
"Convert this hand-drawn whiteboard sketch into a professional corporate flowchart. Use minimalist McKinsey-style aesthetic: clean lines, ample whitespace, blue-and-gray palette. Align all boxes to a strict grid. Connect with straight orthogonal arrows. Transcribe labels into bold sans-serif font." \
--image whiteboard_sketch.png
For images requiring legible text (posters, infographics):
python3 .claude/skills/gemini/scripts/generate_image.py \
'{
"layout": "promotional poster",
"text_elements": {
"title": {"content": "Autumn Special", "style": "elegant gold serif, top center"},
"offer": {"content": "Buy One Get One Free", "style": "modern badge/sticker"},
"footer": {"content": "Limited Time Only", "style": "small clean text"}
},
"background": "cinematic close-up of steaming cappuccino, rustic wooden table, autumn leaves",
"quality": "ensure all text is perfectly spelled, centered, and integrated"
}' \
--model gemini-3-pro-image-preview --thinking 16384
Transfer faces between images with four approaches:
| Type | Use Case |
|---|---|
| Face-on-Body | Put your face on a template/model |
| Body-on-Face | Keep identity, change outfit/pose |
| Full Identity Transfer | Person B in Person A's scenario |
| Head Swap | Different hairstyle (includes hair) |
python3 .claude/skills/gemini/scripts/generate_image.py \
'{
"task": "Place face from Image 2 onto body in Image 1",
"from_image_1": {"keep": ["body", "pose", "clothing", "scene"]},
"from_image_2": {"extract": ["face", "expression", "skin tone"]},
"integration": {"neck_blend": "seamless", "lighting": "match Image 1"}
}' \
--image body_template.png --image face_source.png \
--model gemini-3-pro-image-preview
For detailed techniques, failure fixes, and source image selection, see examples/advanced-techniques.md#face--identity-swaps.
See examples/advanced-techniques.md for detailed guides on all techniques.
gemini-2.5-flash-image or gemini-3-pro-image-preview| Error | Solution |
|---|---|
| "No .env file found" | Create .env with gemini_api = YOUR_KEY |
| "Rate limited" | Script auto-retries; wait if persists. Check quotas above. |
| "Bad request" | Revise prompt (content policy) |
| "Image not found" | Check --image path exists |
See examples/prompts.md for prompt inspiration.
When asked to update Nano Banana / Gemini capabilities:
Check these sources for updates:
Check for changes in:
| What Changed | Update |
|---|---|
| Model names, limits, params | config.md (primary), then generate_image.py if endpoints changed |
| New capabilities | capabilities.md |
| Workflow changes | SKILL.md |
After updating, run a test generation to confirm the changes work:
python3 .claude/skills/gemini/scripts/generate_image.py "test image" --model [new-model-name]
| File | Purpose | When to Read |
|---|---|---|
SKILL.md | Workflow, operations | Always (skill trigger) |
config.md | Models, limits, params | When checking current values |
capabilities.md | Capability detection, failure fixes | Complex operations (face swap, style transfer) |
examples/prompts.md | JSON prompt templates | When user needs specific prompt structure |
examples/advanced-techniques.md | Deep-dive guides | Human learning, complex troubleshooting |