Generate images and videos using Gemini CLI's vision extension. Use for image generation with Nano Banana (gemini-2.5-flash-image), video generation with Veo 3, webcam capture, and image-to-image transformations. Invokes Gemini CLI commands and returns file paths.
Generate AI images and videos by invoking Gemini CLI's vision extension. This skill provides access to:
gemini extensions install vision
GEMINI_API_KEY environment variableUse this skill when the user asks to:
Generate images from text prompts or transform existing images.
Command Pattern:
gemini -p "/vision:banana prompt=\"Your creative prompt here\" n=1 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|---|---|---|
prompt | Required | Creative description of desired image |
n | 1 | Number of images to generate |
out_dir | "." | Output directory for images |
model | gemini-2.5-flash-image | Image generation model |
Models Available:
gemini-2.5-flash-image (default, recommended)gemini-3-pro-image-preview (newer, experimental)Generate short videos from images or prompts.
Command Pattern:
gemini -p "/vision:veo prompt=\"Animate this scene\" aspect_ratio=16:9 out_dir=./output"
Parameters:
| Parameter | Default | Description |
|---|---|---|
prompt | Required | Animation/motion description |
aspect_ratio | "16:9" | Video aspect ratio (16:9 or 9:16) |
resolution | auto | Video resolution (e.g., "1080p") |
negative_prompt | "" | What to avoid in video |
veo_model | veo-3.0-generate-001 | Video model |
Capture from webcam and process with AI.
# Start camera
gemini -p "/vision:start"
# Capture and transform
gemini -p "/vision:banana prompt=\"Transform into oil painting\""
# Stop camera
gemini -p "/vision:stop"
When the user requests image or video generation:
Determine the operation type:
/vision:banana/vision:banana with input image/vision:veo/vision:capture or /vision:bananaConstruct the Gemini CLI command:
gemini -p "/vision:<command> prompt=\"<user prompt>\" <params>"
Execute via Bash tool:
Handle output:
banana_*.png or banana_*.jpgveo_*.mp4User: "Generate an image of a cyberpunk city at sunset"
Action:
gemini -p "/vision:banana prompt=\"A sprawling cyberpunk city at sunset, neon lights reflecting off wet streets, flying cars in the distance, highly detailed, cinematic\" n=1 out_dir=."
User: "Make this photo look like a Studio Ghibli scene" (with image attached)
Action:
gemini -p "/vision:banana prompt=\"Transform into Studio Ghibli animation style, soft colors, whimsical atmosphere\" input_paths=['/path/to/image.jpg']"
User: "Create a video of ocean waves"
Action:
gemini -p "/vision:veo prompt=\"Calm ocean waves gently rolling onto a sandy beach, golden hour lighting, peaceful atmosphere\" aspect_ratio=16:9"
User: "Take a photo of me and make it look like a Renaissance painting"
Action:
# Capture and transform in one step
gemini -p "/vision:banana prompt=\"Transform into a Renaissance oil painting, dramatic lighting, classical composition\""
Always report results in this format:
## Generated Content
**Type:** Image/Video
**Files:**
- `/path/to/banana_20251227_123456_000.png`
**Prompt Used:** [the prompt]
**Model:** gemini-2.5-flash-image
To view: Open the file path above or use `open /path/to/file`
Common issues and solutions:
| Error | Solution |
|---|---|
| "Camera not found" | Run /vision:devices to list cameras |
| "GEMINI_API_KEY not set" | Export the API key in environment |
| "Model not available" | Check model ID spelling |
| "Generation failed" | Try simpler prompt or different model |
For programmatic access, use the helper script:
python ~/.claude/skills/gemini-vision/scripts/gemini_vision.py \
--operation banana \
--prompt "Your prompt here" \
--output-dir ./output \
--count 1
Options:
--operation: banana, veo, capture, devices--prompt: The generation prompt--output-dir: Where to save files--count: Number of images (for banana)--aspect-ratio: For veo (16:9 or 9:16)--model: Override default model