Generate character art and image variations using AI image generation (Google Gemini) with reference images for style and character consistency. Use this skill when the user asks to generate new character poses, mascot variations, art assets, illustrations, or any AI-generated images — especially when maintaining consistency with an existing character or style.
Generate image variations using Google's Gemini image generation model with reference images for style and character consistency. The model supports up to 14 reference images per request and can maintain consistency across multiple characters.
Clarify the subject, pose, expression, context, and where the asset will be used (app screen, social media, website, etc.). This context helps craft the right prompt and choose the right aspect ratio.
Always use 1-2 reference images for consistency:
Primary reference (always first): The most canonical image of the character/subject. This anchors identity — face shape, color palette, defining features.
Style/pose reference (second, optional): Pick the closest existing approved asset to the target pose. This anchors proportions and art style.
The primary reference anchors identity; the style reference anchors proportions. Both together produce the most consistent results.
Write a detailed prompt that describes the exact pose, expression, and style:
Prompt template:
[CHARACTER_DESCRIPTION]. [POSE_AND_EXPRESSION]. [STYLE_DIRECTIVES]. [BACKGROUND]. [VIEW/FRAMING].
Tips:
Run the bundled generation script:
deno run --allow-env --allow-read --allow-write --allow-net \
.claude/skills/image-gen/scripts/generate.ts \
--prompt "your prompt here" \
--ref path/to/primary-reference.png \
--ref path/to/style-reference.png \
--output-dir /tmp/image-gen \
--variants 4 \
--aspect "<choose based on use case>" \
--size "2K"
Parameters:
| Flag | Default | Options |
|---|---|---|
--variants | 4 | 1-8 (each is a separate API call) |
--aspect | 1:1 | 1:1, 3:4, 4:3, 9:16, 16:9, 2:3, 3:2 |
--size | 1K | 512, 1K, 2K, 4K |
Always default to 2K for size — higher resolution gives better quality and can always be downscaled.
Choose aspect ratio based on use case:
| Use Case | Aspect Ratio |
|---|---|
| Full-body character poses | 3:4 |
| App icons, avatars, social profiles | 1:1 |
| Mobile screens, in-app cards | 9:16 or 3:4 |
| Banner/header images, OG images | 16:9 or 4:3 |
| Bust/upper-body portraits | 1:1 or 4:3 |
Cost: ~$0.10/image at 2K = ~$0.40 for 4 variants.
Use the Read tool to visually inspect all generated images. Score each on:
Consistency (most important):
Quality (tiebreaker):
Pick the single best variant and copy it to the project's assets directory with a descriptive name. Briefly explain why you picked it.
If none are good enough, explain what went wrong and offer to regenerate with prompt adjustments.
After picking the best variant:
If some variants fail with 429 errors: wait 60 seconds, then rerun with only the missing number of variants. Don't retry all — just fill in the gaps.
If all fail with 429: wait 60 seconds and try again. If it keeps failing, the daily quota may be exhausted — try later or enable billing for higher limits.