Archivo del skill

Gemini Vision

Name: Gemini Vision
Author: hiapplyco

Generate images and videos using Gemini CLI's vision extension. Use for image generation with Nano Banana (gemini-2.5-flash-image), video generation with Veo 3, webcam capture, and image-to-image transformations. Invokes Gemini CLI commands and returns file paths.

hiapplyco0 estrellas30 dic 2025

Ocupación
Categorías: Medios

Contenido de la habilidad

Gemini Vision Skill

Generate AI images and videos by invoking Gemini CLI's vision extension. This skill provides access to:

Nano Banana (gemini-2.5-flash-image) - Image generation and transformation
Veo 3 (veo-3.0-generate-001) - Video generation from images
Webcam capture - Live frame capture for AI processing

Prerequisites

Gemini CLI: Must be installed and configured
Vision Extension: Install via:
```
gemini extensions install vision
```
API Key: Set GEMINI_API_KEY environment variable

When to Use This Skill

Use this skill when the user asks to:

Skills relacionados

Gemini Vision | Skills Pool

gemini -p "/vision:banana prompt=\"Your creative prompt here\" n=1 out_dir=./output"

gemini -p "/vision:veo prompt=\"Animate this scene\" aspect_ratio=16:9 out_dir=./output"

Parameter	Default	Description
`prompt`	Required	Animation/motion description
`aspect_ratio`	"16:9"	Video aspect ratio (16:9 or 9:16)
`resolution`	auto	Video resolution (e.g., "1080p")
`negative_prompt`	""	What to avoid in video
`veo_model`	veo-3.0-generate-001	Video model

# Start camera
gemini -p "/vision:start"

# Capture and transform
gemini -p "/vision:banana prompt=\"Transform into oil painting\""

# Stop camera
gemini -p "/vision:stop"

Determine the operation type:
- Text-to-image → Use /vision:banana
- Image transformation → Use /vision:banana with input image
- Image-to-video → Use /vision:veo
- Webcam capture → Use /vision:capture or /vision:banana

Construct the Gemini CLI command:

gemini -p "/vision:<command> prompt=\"<user prompt>\" <params>"

Execute via Bash tool:
- Run the command
- Capture the output paths
- Report success and file locations to user
Handle output:
- Images saved as banana_*.png or banana_*.jpg
- Videos saved as veo_*.mp4
- Return the file paths to the user

gemini -p "/vision:banana prompt=\"A sprawling cyberpunk city at sunset, neon lights reflecting off wet streets, flying cars in the distance, highly detailed, cinematic\" n=1 out_dir=."

gemini -p "/vision:banana prompt=\"Transform into Studio Ghibli animation style, soft colors, whimsical atmosphere\" input_paths=['/path/to/image.jpg']"

gemini -p "/vision:veo prompt=\"Calm ocean waves gently rolling onto a sandy beach, golden hour lighting, peaceful atmosphere\" aspect_ratio=16:9"

# Capture and transform in one step
gemini -p "/vision:banana prompt=\"Transform into a Renaissance oil painting, dramatic lighting, classical composition\""

## Generated Content

**Type:** Image/Video
**Files:**
- `/path/to/banana_20251227_123456_000.png`

**Prompt Used:** [the prompt]
**Model:** gemini-2.5-flash-image

To view: Open the file path above or use `open /path/to/file`

Error	Solution
"Camera not found"	Run `/vision:devices` to list cameras
"GEMINI_API_KEY not set"	Export the API key in environment
"Model not available"	Check model ID spelling
"Generation failed"	Try simpler prompt or different model

python ~/.claude/skills/gemini-vision/scripts/gemini_vision.py \
  --operation banana \
  --prompt "Your prompt here" \
  --output-dir ./output \
  --count 1

Parameter	Default	Description
`prompt`	Required	Creative description of desired image
`n`	1	Number of images to generate
`out_dir`	"."	Output directory for images
`model`	gemini-2.5-flash-image	Image generation model

Gemini Vision

Gemini Vision Skill

Prerequisites

When to Use This Skill

Gemini Vision

Gemini Vision Skill

Prerequisites

When to Use This Skill

Available Operations

1. Image Generation (Nano Banana)

2. Video Generation (Veo 3)

3. Webcam Capture + AI

Instructions for Claude

Example Workflows

Generate a Single Image

Transform an Image

Generate a Video

Webcam to Art

Output Format

Error Handling

Script Usage (Alternative)

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api