Name: Image Handling
Author: fabioc-aloha

Skills suchen.../

Image Handling | Skills Pool

# SVG to PNG using sharp-cli (recommended)
# --density sets DPI for vector rendering (150 = crisp text)
npx sharp-cli -i input.svg -o output-folder/ --density 150 -f png

# Note: output must be a directory, filename preserved from input
npx sharp-cli -i banner.svg -o assets/ --density 150 -f png
# Creates: assets/banner.png

# ImageMagick (if installed)
magick input.svg -resize 512x512 output.png
magick input.png -quality 85 output.jpg

# Multiple sizes
foreach ($size in 16,32,64,128,256,512) {
  magick input.svg -resize ${size}x${size} "icon-$size.png"
}

<!-- Absolute URL (always works) -->
![Banner](https://raw.githubusercontent.com/user/repo/main/assets/banner.svg)

<!-- Relative (works in repo) -->
![Banner](./assets/banner.png)

<!-- With dark/light variants -->
<picture>
  <source media="(prefers-color-scheme: dark)" srcset="banner-dark.svg">
  <img src="banner-light.svg" alt="Banner">
</picture>

# PNG optimization
pngquant --quality=65-80 input.png -o output.png

# JPEG optimization
jpegoptim --max=85 input.jpg

# SVG optimization
npx svgo input.svg -o output.svg

# Convert all SVGs to PNGs
Get-ChildItem *.svg | ForEach-Object {
  $out = $_.BaseName + ".png"
  magick $_.Name -resize 256x256 $out
}

Model	Replicate ID	Cost	Best For	Trigger Words
Flux Schnell	`black-forest-labs/flux-schnell`	$0.003	Fast iteration, prototyping	"flux schnell", "quick image", "fast generation"
Flux Dev	`black-forest-labs/flux-dev`	$0.025	High quality no-text images	"flux dev", "high quality image"
Flux 1.1 Pro	`black-forest-labs/flux-1.1-pro`	$0.04	Production, photorealistic	"flux pro", "flux 1.1", "production image"
Flux 2 Pro	`black-forest-labs/flux-2-pro`	~$0.05+	High quality with reference images (up to 8 refs), text rendering	"flux 2", "flux-2-pro", "high quality refs"
Flux 2 Max	`black-forest-labs/flux-2-max`	higher	Highest fidelity BFL output	"flux 2 max", "highest quality"
Flux Kontext Pro	`black-forest-labs/flux-kontext-pro`	$0.04	Text-based image editing, style transfer, outfit changes	"edit image", "kontext", "change background", "outfit"
Flux Kontext Max	`black-forest-labs/flux-kontext-max`	$0.08	Premium editing + improved typography in edited images	"kontext max", "premium edit"
Ideogram v2	`ideogram-ai/ideogram-v2`	$0.08	Banner typography (proven, stable API)	"ideogram v2", "banner with text"
Ideogram v3 Turbo	`ideogram-ai/ideogram-v3-turbo`	$0.03	Fast typography generation	"ideogram turbo", "fast text image", "ideogram v3"
Ideogram v3 Balanced	`ideogram-ai/ideogram-v3-balanced`	$0.06	Balanced quality/speed typography	"ideogram balanced"
Ideogram v3 Quality	`ideogram-ai/ideogram-v3-quality`	$0.09	Highest quality typography	"ideogram quality", "best ideogram"
Nano-Banana Pro	`google/nano-banana-pro`	$0.025	Face-consistent portraits with reference photos (up to 14 refs), 4K	"nano-banana", "face consistency", "portrait", "reference photo"
Nano-Banana 2	`google/nano-banana-2`	$0.067/1K	Faster alternative to nano-banana-pro, same 14-ref API	"nano-banana-2", "fast portrait", "gemini flash image"
SDXL	`stability-ai/sdxl`	$0.009	Classic diffusion, LoRA styles	"sdxl", "stable diffusion", "stable diffusion xl"
Seedream 5 Lite	`bytedance/seedream-5-lite`	varies	2K/3K with built-in reasoning, example-based editing	"seedream", "bytedance", "high resolution"
Recraft v4	`recraft-ai/recraft-v4`	varies	Design taste, strong composition, text rendering	"recraft", "design image", "art directed"
Recraft v4 SVG	`recraft-ai/recraft-v4-svg`	varies	Production-ready SVG vector images	"recraft svg", "vector", "generate svg"
Recraft v4 Pro SVG	`recraft-ai/recraft-v4-pro-svg`	$0.30	High quality SVG with detailed paths	"recraft pro svg", "detailed svg"

// Replicate format
extra_lora: "fofr/flux-pixar-cars"
// HuggingFace format
extra_lora: "huggingface.co/owner/model-name"
// CivitAI format
extra_lora: "civitai.com/models/<id>"
// Direct URL
extra_lora: "https://example.com/weights.safetensors"

const output = await replicate.run("google/nano-banana-pro", {
  input: {
    prompt: "Description of desired scene",
    image_input: referenceImageURIs,  // Array of data URIs (up to 14)
    aspect_ratio: "3:4",
    output_format: "png",
  }
});

const output = await replicate.run("black-forest-labs/flux-2-pro", {
  input: {
    prompt: "Description of desired scene",
    input_images: referenceImageURIs,  // Array of data URIs (up to 8)
    aspect_ratio: "3:4",
    output_format: "png",
  }
});

# Resize to 512px @ 85% quality for optimal API performance
magick input.jpg -resize 512x512 -quality 85 output.jpg

# Convert to base64 data URI (for embedding in visual memory)
[Convert]::ToBase64String([IO.File]::ReadAllBytes("photo.jpg")) | Set-Clipboard

Model	Replicate ID	Cost	Duration	Audio	Best For
Veo-3	`google/veo-3`	$0.50/video	4, 6, or 8s only	✅ Auto	Short clips with synced audio
Veo-3.1-fast	`google/veo-3.1-fast`	lower	4-8s	✅ Context-aware audio	Newer/faster Veo 3, last-frame support
Veo-3.1	`google/veo-3.1`	higher	4-8s	✅ Context-aware audio	Highest fidelity successor to Veo 3
Grok Video	`xai/grok-imagine-video`	$0.05/sec	1-15s	✅ Auto (music, SFX, lip-sync)	Longer videos, best audio
Kling v3	`kwaivgi/kling-v3-video`	$0.22/sec	3-15s	✅ Native	Cinematic quality, 1080p, multi-shot
Kling v3 Omni	`kwaivgi/kling-v3-omni-video`	varies	3-15s	✅ Native	Multi-modal: text, ref image, editing
Sora-2	`openai/sora-2`	varies	flexible	✅ Synced	Home-video realism, flexible prompting
WAN 2.5 fast	`wan-video/wan-2.5-t2v-fast`	low	5-10s	❌	Open-source, fast, cost-effective

// Step 1: Generate still image
const image = await replicate.run("google/nano-banana-pro", {
  input: { prompt: "Person smiling at camera", image_input: refs }
});

// Step 2: Animate to video
const video = await replicate.run("google/veo-3", {
  input: {
    prompt: "Head turns slowly, smile widens, warm natural lighting",
    image: imageUrl,
    duration: 6
  }
});

Model	Replicate ID	Cost	Voice Cloning	Languages	Best For
Speech Turbo	`minimax/speech-2.8-turbo`	$0.06/1k tokens	❌	40+	Fast, expressive, many voices
Chatterbox Turbo	`resemble-ai/chatterbox-turbo`	$0.025/1k chars	✅ (5s sample)	English	Voice cloning, natural pauses
Qwen TTS	`qwen/qwen3-tts`	$0.02/1k chars	✅	10	Voice design from description

const output = await replicate.run("resemble-ai/chatterbox-turbo", {
  input: {
    text: "Content to speak in the cloned voice",
    audio_prompt: referenceAudioDataURI  // 5+ seconds WAV/MP3
  }
});

const output = await replicate.run("qwen/qwen3-tts", {
  input: {
    text: "Content to speak",
    tts_mode: "voice_design",
    voice_description: "A warm, friendly female voice with a slight British accent"
  }
});

Scenario	Recommended	Why
Read document in VS Code	Edge TTS (extension)	Free, instant, integrated
Create audiobook narration	Replicate TTS	Higher quality, voice cloning
Generate voice for video	Replicate TTS	Emotion control, design voices
Multi-language content creation	Either	Edge has 32 languages; Speech Turbo has 40+

Format	Best For	Supports
SVG	Icons, logos, diagrams	Infinite scale, animation
PNG	Screenshots, transparency	Lossless, alpha channel
JPEG	Photos, gradients	Small size, no transparency
WebP	Web images	Best compression, both
ICO	Favicons	Multi-resolution

Format	Best For	Supports
SVG	Icons, logos, diagrams	Infinite scale, animation
PNG	Screenshots, transparency	Lossless, alpha channel
JPEG	Photos, gradients	Small size, no transparency
WebP	Web images	Best compression, both
ICO	Favicons	Multi-resolution

Use Case	Max Size	Recommended
README banner	500KB	< 100KB
Documentation	200KB	< 50KB
Icons	50KB	< 10KB
Favicon	10KB	< 5KB

Ratio	Models	Use Case
`21:9`	Flux (all)	Ultra-wide README banner
`3:1`	Ideogram	Wide banner with typography
`16:9`	All	Standard widescreen
`1:1`	All	Square, avatar, icon
`9:16`	All	Mobile, portrait

Image Handling

Image Handling Skill

Format Selection

Conversion Commands

Image Handling

Image Handling Skill

Format Selection

Conversion Commands

SVG to PNG Tips

GitHub README Images

Size Guidelines

Optimization

Batch Processing

Replicate Model Selection

Model Selection Guide

LoRA Support (Flux Dev / SDXL)

Aspect Ratio Reference

Face Reference Models

Nano-Banana Pro (Recommended for Portraits)

Flux 2 Pro (Higher Quality Alternative)

Preparing Reference Photos

Video Generation Models

Duration Constraints

Video Generation Pattern

Cloud TTS Models (Replicate)

Voice Presets

Emotion Control (Speech Turbo)

Voice Cloning (Chatterbox / Qwen)

Voice Design (Qwen TTS)

When to Use Cloud TTS vs Edge TTS

Synapses

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api

Model	Min	Max	Notes
Veo-3	4s	8s	Only accepts 4, 6, or 8 — other values rejected
Grok Video	1s	15s	Flexible, any integer
Kling v3	3s	15s	Modes: `standard` (720p), `pro` (1080p)