Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".
/podcast, /speech)/explainer)/content-parser)Generate AI images using the Labnana API. Supports text prompts with optional reference images, multiple resolutions, and aspect ratios. Images are saved as local files.
shared/authentication.md for API key and headersshared/common-patterns.md for error handlinghttps://api.labnana.com/openapi/v1shared/config-pattern.md before any interaction.listenhub/image-gen/YYYY-MM-DD-{jobId}/ — never ~/Downloads/Follow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Follow shared/config-pattern.md Step 0.
If file doesn't exist — ask location, then create immediately:
mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
# (or $HOME/.listenhub/image-gen/config.json for global)
Then run Setup Flow below.
If file exists — read config, display summary, and confirm:
当前配置 (image-gen):
输出方式:{inline / download / both}
Ask: "使用已保存的配置?" → 确认,直接继续 / 重新配置
shared/output-mode.md § Setup Flow Question.Save immediately:
# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Free text input. Ask the user:
Describe the image you want to generate.
If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.
Ask:
Question: "Which model?"
Options:
- "pro (recommended)" — gemini-3-pro-image-preview, higher quality
- "flash" — gemini-3.1-flash-image-preview, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)
Ask both together (independent parameters):
Question: "What resolution?"
Options:
- "1K" — Standard quality
- "2K (recommended)" — High quality, good balance
- "4K" — Ultra high quality, slower generation
Question: "What aspect ratio?"
Options (all models):
- "16:9" — Landscape, widescreen
- "1:1" — Square
- "9:16" — Portrait, phone screen
- "Other" — 2:3, 3:2, 3:4, 4:3, 21:9
If flash model was selected, also offer: 1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)
Question: "Any reference images for style guidance?"
Options:
- "Yes, I have URL(s)" — Provide reference image URLs
- "No references" — Generate from prompt only
If yes, collect URLs (comma-separated, max 14). For each URL, infer mimeType from suffix and build:
{ "fileData": { "fileUri": "<url>", "mimeType": "<inferred>" } }
Suffix mapping: .jpg/.jpeg → image/jpeg, .png → image/png, .webp → image/webp, .gif → image/gif
Summarize all choices:
Ready to generate image:
Prompt: {prompt text}
Model: {pro / flash}
Resolution: {1K / 2K / 4K}
Aspect ratio: {ratio}
References: {yes (N URLs) / no}
Proceed?
Wait for explicit confirmation before calling the API.
POST https://api.labnana.com/openapi/v1/images/generation with timeout of 600sRead OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Decode base64 to a temp file, then use the Read tool.
JOB_ID=$(date +%s)
echo "$BASE64_DATA" | base64 -D > /tmp/image-gen-${JOB_ID}.jpg
Then use the Read tool on /tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.
Present:
图片已生成!
download or both: Save to the artifact directory.
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Present:
图片已生成!
已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/:
{jobId}.jpg
Base64 decoding (cross-platform):
# Linux
echo "$BASE64_DATA" | base64 -d > output.jpg
# macOS
echo "$BASE64_DATA" | base64 -D > output.jpg
# or
echo "$BASE64_DATA" | base64 --decode > output.jpg
Retry logic: On 429 (rate limit), wait 15 seconds and retry. Max 3 retries.
Default: Pass the user's prompt directly without modification.
When to offer optimization:
When to never modify:
Optimization techniques (if user agrees):
shared/api-image.mdshared/common-patterns.md § Error HandlingUser: "Generate an image: cyberpunk city at night"
Agent workflow:
RESPONSE=$(curl -sS -X POST "https://api.labnana.com/openapi/v1/images/generation" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
--max-time 600 \
-d '{
"provider": "google",
"model": "gemini-3-pro-image-preview",
"prompt": "cyberpunk city at night",
"imageConfig": {"imageSize": "2K", "aspectRatio": "16:9"}
}')
BASE64_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[0].inlineData.data // .data')
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
echo "$BASE64_DATA" | base64 -D > "${JOB_DIR}/${JOB_ID}.jpg"
Decode the base64 data per outputMode (see shared/output-mode.md).