Analyze images/audio/video with Gemini API. Generate images (Imagen 4), videos (Veo 3). Use for vision analysis, transcription, OCR, design extraction, multimodal AI.
Process audio, images, videos, documents, and generate images/videos using Google Gemini's multimodal API.
export GEMINI_API_KEY="your-key" # https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow
Verify: python .agent/skills/ai-multimodal/scripts/check_setup.py
# Analyze image or media file
python .agent/skills/ai-multimodal/scripts/gemini_batch_process.py \
--files <file> --task analyze --prompt "Describe this"
# Transcribe audio/video
python .agent/skills/ai-multimodal/scripts/gemini_batch_process.py \
--files audio.mp3 --task transcribe
# Generate image
python .agent/skills/ai-multimodal/scripts/gemini_batch_process.py \
--task generate --prompt "A minimal logo for a tech startup"
# Generate video
python .agent/skills/ai-multimodal/scripts/gemini_batch_process.py \
--task generate-video --prompt "A wave crashing on shore"
Tip: If gemini CLI is available, prefer:
echo "<prompt>" | gemini -y -m gemini-2.5-flash
| Task | Model |
|---|---|
| Analysis | gemini-2.5-flash (fast), gemini-2.5-pro (quality) |
| Image generation | imagen-4.0-generate-001 |
| Image generation (quality) | imagen-4.0-ultra-generate-001 |
| Video generation | veo-3.1-generate-preview (8s clips with audio) |
| Script | Purpose |
|---|---|
gemini_batch_process.py | Main CLI for all tasks |
media_optimizer.py | Compress/resize media before upload |
document_converter.py | Convert PDFs/Office docs to markdown |
check_setup.py | Verify setup and API key |
[HH:MM:SS -> HH:MM:SS] transcript contentFor high-volume usage, set multiple keys:
export GEMINI_API_KEY="key1"
export GEMINI_API_KEY_2="key2"
export GEMINI_API_KEY_3="key3"
Auto-rotates on 429/RESOURCE_EXHAUSTED with 60s cooldown per key.
references/vision-understanding.md — Image analysis patternsreferences/image-generation.md — Imagen 4 generation guidereferences/audio-processing.md — Transcription and audio analysisreferences/video-analysis.md — Video analysis patternsreferences/video-generation.md — Veo generation guide