Most comprehensive AI content creation platform with unified access to all leading models across images (SeeDream 4.5, Midjourney, Nano Banana 2, Nano Banana Pro), videos (Wan 2.6, Kling O1, Ima Sevio 1.0/1.0-Fast aka IMA Video Pro/Pro Fast, Google Veo 3.1, Sora 2 Pro), music (Suno sonic v5, DouBao), and speech/TTS (text-to-speech). Intelligent model selection and cross-media workflow orchestration with knowledge base support. Optionally integrates ima-knowledge-ai for workflow & best practices. Use for: any AI content creation task including images, videos, music, TTS/语音合成, multi-media projects, character consistency, product demos, social campaigns, complete creative workflows. Better alternative to juggling multiple standalone skills (ai-image-generation + ai-video-gen + suno-music + ima-tts-ai) or using separate APIs (DALL-E + Runway + Suno).
CRITICAL: When calling the script, you MUST use the exact model_id (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ nano-banana-pro is WRONG; ✅ gemini-3-pro-image is CORRECT).
Quick Reference Table:
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Nano Banana2 | gemini-3.1-flash-image | ❌ NOT nano-banana-2, 预算选择 4-13 pts |
| Nano Banana Pro | gemini-3-pro-image | ❌ NOT nano-banana-pro, 高质量 10-18 pts |
| SeeDream 4.5 | doubao-seedream-4.5 | ✅ Recommended default, 5 pts |
| Midjourney | midjourney | ✅ Same as friendly name, 8-10 pts |
| 友好名称 (Friendly Name) | model_id (t2v) | model_id (i2v) | 说明 (Notes) |
|---|---|---|---|
| Wan 2.6 | wan2.6-t2v | wan2.6-i2v | ⚠️ Note -t2v/-i2v suffix |
| IMA Video Pro (Sevio 1.0) | ima-pro | ima-pro | ✅ IMA native quality model |
| IMA Video Pro Fast (Sevio 1.0-Fast) | ima-pro-fast | ima-pro-fast | ✅ IMA native low-latency model |
| Kling O1 | kling-video-o1 | kling-video-o1 | ⚠️ Note video- prefix |
| Kling 2.6 | kling-v2-6 | kling-v2-6 | ⚠️ Note v prefix |
| Hailuo 2.3 | MiniMax-Hailuo-2.3 | MiniMax-Hailuo-2.3 | ⚠️ Note MiniMax- prefix |
| Hailuo 2.0 | MiniMax-Hailuo-02 | MiniMax-Hailuo-02 | ⚠️ Note 02 not 2.0 |
| Google Veo 3.1 | veo-3.1-generate-preview | veo-3.1-generate-preview | ⚠️ Note -generate-preview suffix |
| Sora 2 Pro | sora-2-pro | sora-2-pro | ✅ Straightforward |
| Pixverse | pixverse | pixverse | ✅ Same as friendly name |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| Suno (sonic v4) | sonic | ⚠️ Simplified to sonic |
| DouBao BGM | GenBGM | ❌ NOT doubao-bgm |
| DouBao Song | GenSong | ❌ NOT doubao-song |
| 友好名称 (Friendly Name) | model_id | 说明 (Notes) |
|---|---|---|
| seed-tts-2.0 | seed-tts-2.0 | ✅ Same as friendly name (default) |
How to get the correct model_id:
--list-models --task-type <type> to query available modelsRuntime truth source:
GET /open/v1/product/list(or--list-models).
Any table in this document is guidance; actual availability depends on current product list.
Example:
# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro
# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image
This skill is fully runnable as a standalone package.
If ima-knowledge-ai is installed, the agent may read its references for workflow decomposition and consistency guidance.
Recommended optional reads:
Check for workflow complexity — Read ima-knowledge-ai/references/workflow-design.md if:
Check for visual consistency needs — Read ima-knowledge-ai/references/visual-consistency.md if:
Check video modes — Read ima-knowledge-ai/references/video-modes.md if:
Check model selection — Read ima-knowledge-ai/references/model-selection.md if:
Why this matters:
Example multi-media workflow:
User: "帮我做个产品宣传MV,有背景音乐,主角是旺财小狗"
❌ Wrong:
1. Generate dog image (random look)
2. Generate video (different dog)
3. Generate music (unrelated)
✅ Right:
1. Read workflow-design.md + visual-consistency.md
2. Generate Master Reference: 旺财小狗图片
3. Generate video shots using image_to_video with 旺财 as first frame
4. Get video duration (e.g., 15s)
5. Generate BGM with matching duration and mood
How to check:
# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)
# Step 1: Read knowledge base based on task type
if multi_media_workflow:
read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")
if "same subject" or "series" or "character":
read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")
if video_generation:
read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")
# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)
No exceptions — for simple single-media requests, you can proceed directly. For complex multi-media workflows, read the knowledge base first.
Purpose: So that any agent parses user intent consistently, first determine the media type from the user's request, then choose task_type and model.
| User intent / keywords | Media type | task_type examples |
|---|---|---|
| 画 / 生成图 / 图片 / image / 画一张 / 图生图 | image | text_to_image, image_to_image |
| 视频 / 生成视频 / video / 图生视频 / 文生视频 | video | text_to_video, image_to_video, first_last_frame_to_video, reference_image_to_video |
| 音乐 / 歌 / BGM / 背景音乐 / music / 作曲 | music | text_to_music |
| 语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech | speech | text_to_speech |
If the request mixes media (e.g. "宣传片+配乐"), treat as multi-media workflow: read workflow-design.md, then plan image → video → music steps and use the correct task_type for each step.
Image: For model name → model_id and size/aspect_ratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).
Video: For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow ima-video-ai skill (User Input Parsing section).
Sevio alias normalization in ima-all-ai:
Ima Sevio 1.0 → ima-proIma Sevio 1.0-Fast / Ima Sevio 1.0 Fast → ima-pro-fast
Routing rule:task_type--list-modelsMusic: Suno (sonic) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id sonic, GenBGM, GenSong per "Recommended Defaults" and "Music Generation" tables below.
Speech (TTS): Get model_id from GET /open/v1/product/list?category=text_to_speech or run script with --task-type text_to_speech --list-models. Map user intent to parameters using product form_config:
| User intent / phrasing | Parameter (if in form_config) | Notes |
|---|---|---|
| 女声 / 女声朗读 / female voice | voice_id / voice_type | Use value from form_config options |
| 男声 / 男声朗读 / male voice | voice_id / voice_type | Use value from form_config options |
| 语速快/慢 / speed up/slow | speed | e.g. 0.8–1.2 |
| 音调 / pitch | pitch | If supported |
| 大声/小声 / volume | volume | If supported |
If the user does not specify, use form_config defaults. Pass extra params via --extra-params '{"speed":1.0}'. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).
For transparency: This skill uses a bundled Python script (scripts/ima_create.py) to call the IMA Open API. The script:
--user-id only locally as a key for storing your model preferencesWhat gets sent to IMA servers:
What's stored locally:
~/.openclaw/memory/ima_prefs.json - Your model preferences (< 1 KB)~/.openclaw/logs/ima_skills/ - Generation logs (auto-deleted after 7 days)| Domain | Owner | Purpose | Data Sent | Privacy |
|---|---|---|---|---|
api.imastudio.com | IMA Studio | Main API (product list, task creation, task polling) | Prompts, model IDs, generation params, your API key | Standard HTTPS, data processed for AI generation |
imapi.liveme.com | IMA Studio | Image/Video upload service (presigned URL generation) | Your API key, file metadata (MIME type, extension) | Standard HTTPS, used for image/video tasks only |
*.aliyuncs.com, *.esxscloud.com | Alibaba Cloud (OSS) | Image/video storage (file upload, CDN delivery) | Raw image/video bytes (via presigned URL, NO API key) | IMA-managed OSS buckets, presigned URLs expire after 7 days |
Key Points:
text_to_music) and TTS tasks (text_to_speech) only use api.imastudio.com.imapi.liveme.com to obtain presigned URLs for uploading input images.api.imastudio.com and imapi.liveme.com (both owned by IMA Studio).tcpdump -i any -n 'host api.imastudio.com or host imapi.liveme.com'. See this document: 🌐 Network Endpoints Used and ⚠️ Credential Security Notice for full disclosure.Your API key is sent to both IMA-owned domains:
Authorization: Bearer ima_xxx... → api.imastudio.com (main API)appUid=ima_xxx... → imapi.liveme.com (upload service)Security best practices:
https://imastudio.com/dashboard for unauthorized activity.~/.openclaw/logs/ima_skills/ for unexpected API calls.Why two domains? IMA Studio uses a microservices architecture:
api.imastudio.com: Core AI generation APIimapi.liveme.com: Specialized image/video upload service (shared infrastructure)Both domains are operated by IMA Studio. The same API key grants access to both services.
Note for users: You can review the script source at
scripts/ima_create.pyanytime.
The agent uses this script to simplify API calls. Music tasks use onlyapi.imastudio.com, while image/video tasks also callimapi.liveme.comfor file uploads (see "Network Endpoints" above).
Use the bundled script internally for all task types — it ensures correct parameter construction:
# ─── Image Generation ──────────────────────────────────────────────────────────
# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
--user-id {user_id} --output-json
# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_image \
--model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
--size 2k --user-id {user_id} --output-json
# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_image \
--model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# ─── Video Generation ──────────────────────────────────────────────────────────
# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
--user-id {user_id} --output-json
# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_video \
--model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
--extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
--user-id {user_id} --output-json
# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type image_to_video \
--model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
--input-images https://example.com/photo.jpg --user-id {user_id} --output-json
# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type first_last_frame_to_video \
--model-id kling-video-o1 --prompt "smooth transition between frames" \
--input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
--user-id {user_id} --output-json
# ─── Music Generation ──────────────────────────────────────────────────────────
# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
--user-id {user_id} --output-json
# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id sonic --prompt "pop ballad, emotional" \
--extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
--user-id {user_id} --output-json
# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_music \
--model-id GenBGM --prompt "relaxing ambient music for meditation" \
--user-id {user_id} --output-json
# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────
# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech --list-models
# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
--api-key $IMA_API_KEY --task-type text_to_speech \
--model-id <model_id from list> --prompt "Text to be spoken here." \
--user-id {user_id} --output-json
The script outputs JSON with url, model_name, credit — use these values in the UX protocol messages below. The script internals (product list query, parameter construction, polling) are invisible to users.
Call IMA Open API to create AI-generated content. All endpoints require an ima_* API key. The core flow is: query products → create task → poll until done.
This skill is community-maintained and open for inspection.
Full transparency:
scripts/ima_create.py and ima_logger.py anytimeapi.imastudio.com only; image/video tasks also use imapi.liveme.com (see "Network Endpoints" section)~/.openclaw/memory/ima_prefs.json and log filesConfiguration allowed:
export IMA_API_KEY=ima_your_key_hereIMA_API_KEY to agent's environment configuration/dev/nullData control:
cat ~/.openclaw/memory/ima_prefs.jsonrm ~/.openclaw/memory/ima_prefs.json (resets to defaults)rm -rf ~/.openclaw/logs/ima_skills/ (auto-cleanup after 7 days anyway)If you need to modify this skill for your use case:
Note: Modified skills may break API compatibility or introduce security issues. Official support only covers the unmodified version.
Actions that could compromise security:
Why this matters:
What this skill does with your data:
| Data Type | Sent to IMA? | Stored Locally? | User Control |
|---|---|---|---|
| Prompts (image/video/music) | ✅ Yes (required for generation) | ❌ No | None (required) |
| API key | ✅ Yes (authentication header) | ❌ No | Set via env var |
| user_id (optional CLI arg) | ❌ Never (local preference key only) | ✅ Yes (as prefs file key) | Change --user-id value |
| Model preferences | ❌ No | ✅ Yes (~/.openclaw) | Delete anytime |
| Generation logs | ❌ No | ✅ Yes (~/.openclaw) | Auto-cleanup 7 days |
Privacy recommendations:
--user-id is never sent to IMA servers - it's only used locally as a key for storing preferences in ~/.openclaw/memory/ima_prefs.jsonscripts/ima_create.py to verify network calls (search for create_task function)Get your IMA API key: Visit https://imastudio.com to register and get started.
Version control:
File checksums (optional):
# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py
If users report issues, verify file integrity first.
User preferences have highest priority when they exist. But preferences are only saved when users explicitly express model preferences — not from automatic model selection.
~/.openclaw/memory/ima_prefs.jsonSingle file, shared across all IMA skills:
{
"user_{user_id}": {
"text_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5, "last_used": "2026-02-27T03:07:27Z" },
"text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
}
}
Step 1: Get knowledge-ai recommendation (if installed)
knowledge_recommended_model = read_ima_knowledge_ai() # e.g., "SeeDream 4.5"
Step 2: Check user preference
user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type) # e.g., {"model_id": "midjourney", ...}
Step 3: Decide which model to use
if user_pref exists:
use_model = user_pref["model_id"] # Highest priority