Generate images from text prompts, edit existing images (img2img, inpainting, outpainting), via a local image generation API backed by FLUX.1, SDXL, and automatic cloud fallback. Use when the user asks to create an image, make a quick visual concept, generate multiple images sequentially, modify an existing image, inpaint/remove/replace parts of an image, or extend an image.
Use the local image generation service exposed by IMAGES_API_URL.
Bundled wrappers live in scripts/ next to this file:
scripts/generate_image_job.py — text-to-image helperscripts/generate_image_img2img_job.py — image-to-image helperscripts/generate_image_inpaint_job.py — inpainting/outpainting helper (FLUX Fill)scripts/generate_image_upscale_job.py — 4x upscale helper (UltraSharp)scripts/generate_image_video_job.py — image-to-video helper (Wan 2.2)Prefer these Python wrappers over ad-hoc shell JSON parsing when you need a reliable local helper.
flux-dev1024x1024203.54257.0-1Choose the model based on the task:
| Task | Model | Why |
|---|---|---|
| Best quality, details, readable text, hands | flux-dev | 20 steps, best anatomy and text rendering |
| Fast preview / rough prototype | flux-schnell | 4 steps, <2s |
| Stylized, anime, LoRA ecosystems | sdxl | Huge LoRA ecosystem, negative_prompt support |
| Task | Model | Why |
|---|---|---|
| Replace part of image (inpainting) | flux-fill | Best open inpainting model, mask-based |
| Extend image (outpainting) | flux-fill | Pad canvas, mask empty area |
| Light style/detail changes | flux-dev (img2img) | denoise 0.3-0.7 |
| Strong regeneration from reference | flux-dev (img2img) | denoise 0.7-0.9 |
| Task | Model | Why |
|---|---|---|
| 4x upscale any image | upscale | Instant, UltraSharp neural upscaler, no prompt needed |
| Task | Model | Why |
|---|---|---|
| Animate an image (image-to-video) | wan-video | Wan 2.2 14B, 5s clip at 16fps, ~10 min |
negative_prompt; FLUX ignores itprompt and guidance_scalenegative_prompt is supported and usefulflux-devflux-fill with both input_image and mask_imagePOST /jobs → GET /jobs/{job_id} → GET /jobs/{job_id}/resultGeneration uses an async job queue. This is the canonical and expected flow for text-to-image requests: submit a job, poll for completion, download the result.
Do not assume a synchronous text-to-image endpoint exists. Do not bypass the job queue when POST /jobs is available.
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d '{
"prompt": "DESCRIPTION",
"model": "flux-dev",
"width": 1024,
"height": 1024,
"steps": 20,
"guidance_scale": 3.5,
"seed": -1
}')
JOB_ID=$(echo "$JOB" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
echo "Job submitted: $JOB_ID"
Supported parameters:
prompt — requiredmodel — flux-dev, flux-schnell, flux-fill, sdxl, upscale, wan-videowidth, height — 256..2048stepsguidance_scaleseednegative_prompt — SDXL only; ignored by FLUXinput_image — uploaded filename for img2img or inpaintingmask_image — uploaded filename for inpainting (required with flux-fill)denoise — strength for img2img/inpainting (0.0-1.0)Response:
{"job_id": "abc123...", "status": "queued", "position": 1, "created_at": "..."}
while true; do
R=$(curl -sf "${IMAGES_API_URL}/jobs/${JOB_ID}")
S=$(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
case "$S" in
completed) echo "Done"; break ;;
failed) echo "Job failed: $(echo "$R" | python3 -c "import sys,json; print(json.load(sys.stdin).get('error',''))")"; exit 1 ;;
cancelled) echo "Job cancelled"; exit 1 ;;
*) echo "Status: $S, waiting..."; sleep 5 ;;
esac
done
Observed in testing:
flux-schnell text-to-image jobs can complete very quickly (single-digit seconds)flux-dev img2img jobs may take around 1-2 minutes, so do not assume a short poll loop is enoughqueued -> processing -> completed is the expected happy pathJob statuses: queued → processing → completed / failed / cancelled
curl -sf "${IMAGES_API_URL}/jobs/${JOB_ID}/result" -o /tmp/generated.png
Returns PNG with headers X-Source, X-Seed, X-Model.
Observed in testing:
/jobs/${JOB_ID}/result before completion really does return 202 with JSON like {"job_id":"...","status":"queued","position":1}/jobs/${JOB_ID}/result for a cancelled job returns 410 with a message such as {"detail":"Job was cancelled"}Download the result promptly — completed job results expire after 10 minutes.
IMG2IMG is only supported with the flux-dev model. Do not use flux-schnell or sdxl for img2img.
Two-step process: upload the source image, then submit a job referencing it.
UPLOAD=$(curl -sf -X POST "${IMAGES_API_URL}/upload" \
-F "image=@/path/to/photo.png")
FILENAME=$(echo "$UPLOAD" | python3 -c "import sys,json; print(json.load(sys.stdin)['filename'])")
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d "{
\"prompt\": \"DESCRIBE THE CHANGES\",
\"model\": \"flux-dev\",
\"input_image\": \"${FILENAME}\",
\"denoise\": 0.65
}")
JOB_ID=$(echo "$JOB" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
Then poll and download as described above.
Recommended bundled wrapper:
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_img2img_job.py \
--input /path/to/source.jpg \
--prompt "DESCRIBE THE CHANGES" \
--output /tmp/edited.png
Denoise guidance:
0.3-0.5 — light edits0.5-0.7 — moderate changes0.7-0.9 — strong regenerationUse flux-fill to replace or generate content in specific areas of an image using a mask.
The mask is a separate image, same dimensions as the source:
UPLOAD_IMG=$(curl -sf -X POST "${IMAGES_API_URL}/upload" \
-F "image=@/path/to/source.png")
IMG_FILENAME=$(echo "$UPLOAD_IMG" | python3 -c "import sys,json; print(json.load(sys.stdin)['filename'])")
UPLOAD_MASK=$(curl -sf -X POST "${IMAGES_API_URL}/upload" \
-F "image=@/path/to/mask.png")
MASK_FILENAME=$(echo "$UPLOAD_MASK" | python3 -c "import sys,json; print(json.load(sys.stdin)['filename'])")
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d "{
\"prompt\": \"DESCRIBE WHAT SHOULD APPEAR IN THE MASKED AREA\",
\"model\": \"flux-fill\",
\"input_image\": \"${IMG_FILENAME}\",
\"mask_image\": \"${MASK_FILENAME}\",
\"denoise\": 1.0
}")
JOB_ID=$(echo "$JOB" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
Then poll and download as described above.
denoise: 0.8-1.0 for full replacement, 0.5-0.7 for gentle correctionguidance_scale: 3.5 (same as flux-dev)steps: 20prompt: describes what should be in the masked area, NOT the whole imageTo extend an image beyond its borders:
flux-fill, prompt describing what should appear beyond the edgesThe wrapper accepts either an existing mask file or one or more rectangles to build the mask inline (no PIL or other image tools required):
# (a) existing mask file
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_inpaint_job.py \
--input /path/to/source.png \
--mask /path/to/mask.png \
--prompt "a golden retriever sitting on the grass" \
--output /tmp/inpainted.png
# (b) inline rectangle mask (auto-detects size from --input)
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_inpaint_job.py \
--input /path/to/source.png \
--mask-rect 128,256,400,300 \
--prompt "a golden retriever sitting on the grass" \
--output /tmp/inpainted.png
--mask-rect x,y,w,h is repeatable to combine several rectangles. Use
--mask-size W,H to override the auto-detected size and --mask-invert
to flip foreground/background.
Use wan-video to animate a still image into a ~5 second video clip. This uses the Wan 2.2 14B model with two-stage denoising.
Important: Video generation takes ~10 minutes on RTX 4090. Do not submit multiple video jobs simultaneously. Output is animated WebP.
UPLOAD=$(curl -sf -X POST "${IMAGES_API_URL}/upload" \
-F "image=@/path/to/image.png")
FILENAME=$(echo "$UPLOAD" | python3 -c "import sys,json; print(json.load(sys.stdin)['filename'])")
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d "{
\"prompt\": \"DESCRIBE THE DESIRED MOTION\",
\"model\": \"wan-video\",
\"input_image\": \"${FILENAME}\",
\"width\": 768,
\"height\": 768
}")
JOB_ID=$(echo "$JOB" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
Then poll and download as described above. Use a longer poll interval (10s) and timeout (900s).
prompt: describe the motion, not the scene ("camera slowly zooms in", "wind blows through hair")width, height: 768x768 default, up to 720p (1280x720) but slowersteps: 20 (two-stage: 10+10)guidance_scale: 3.5seed: for reproducibilitynegative_prompt: has a good default (Chinese quality tags), rarely needs changingpython3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_video_job.py \
--input /path/to/image.png \
--prompt "camera slowly zooms in, gentle wind" \
--output /tmp/video.webp
Use upscale to increase image resolution by 4x using the UltraSharp neural network upscaler. No prompt needed — just upload the image.
A 1024x1024 image becomes 4096x4096. Fast (seconds, not minutes).
UPLOAD=$(curl -sf -X POST "${IMAGES_API_URL}/upload" \
-F "image=@/path/to/image.png")
FILENAME=$(echo "$UPLOAD" | python3 -c "import sys,json; print(json.load(sys.stdin)['filename'])")
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d "{\"model\": \"upscale\", \"input_image\": \"${FILENAME}\"}")
JOB_ID=$(echo "$JOB" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
Then poll and download as described above.
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_upscale_job.py \
--input /path/to/image.png \
--output /tmp/upscaled.png
For very fast previews, use flux-schnell with 4 steps:
JOB=$(curl -sf -X POST "${IMAGES_API_URL}/jobs" \
-H "Content-Type: application/json" \
-d '{"prompt":"DESCRIPTION","model":"flux-schnell","steps":4}')
If you need to cancel a queued job:
curl -sf -X DELETE "${IMAGES_API_URL}/jobs/${JOB_ID}"
Only works for jobs in queued status.
Observed in testing: cancelling a queued job returns HTTP 200 with JSON like {"job_id":"...","status":"cancelled"}.
If POST /jobs returns 503 — generator is unavailable (GPU paused for gaming, or ComfyUI offline) and no cloud fallback is configured. Do NOT retry. Tell the user generation is unavailable.
If POST /jobs returns 429 — queue is full (max 50 jobs). Wait 30 seconds and retry.
If you need to inspect available checkpoints or LoRAs:
curl -sf "${IMAGES_API_URL}/models"
Use the bundled wrappers when you want deterministic local execution without reimplementing the jobs flow.
Text-to-image:
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_job.py \
--prompt "night sky, stars, realistic astronomy photo" \
--output /tmp/generated.png
Image-to-image:
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_img2img_job.py \
--input /path/to/source.png \
--prompt "turn this into a cinematic night scene" \
--output /tmp/edited.png
Inpainting (use --mask-rect x,y,w,h to build the mask inline, or --mask /path/to/mask.png for a file):
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_inpaint_job.py \
--input /path/to/source.png \
--mask-rect 128,256,400,300 \
--prompt "a golden retriever sitting on the grass" \
--output /tmp/inpainted.png
Upscale:
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_upscale_job.py \
--input /path/to/image.png \
--output /tmp/upscaled.png
Video (image-to-video):
python3 /home/openclaw/.openclaw/workspace/skills/generate_image/scripts/generate_image_video_job.py \
--input /path/to/image.png \
--prompt "camera slowly zooms in" \
--output /tmp/video.webp
Keep these scripts as the canonical local wrappers for this skill. If you improve the flow, update the skill-bundled scripts first.
Before replying, verify that the output file exists and is not empty:
test -s /tmp/generated.png
Then: