Skill-Datei

IMA AI Creation

Name: IMA AI Creation
Author: openclaw

Most comprehensive AI content creation platform with unified access to all leading models across images (SeeDream 4.5, Midjourney, Nano Banana 2, Nano Banana Pro), videos (Wan 2.6, Kling O1, Ima Sevio 1.0/1.0-Fast aka IMA Video Pro/Pro Fast, Google Veo 3.1, Sora 2 Pro), music (Suno sonic v5, DouBao), and speech/TTS (text-to-speech). Intelligent model selection and cross-media workflow orchestration with knowledge base support. Optionally integrates ima-knowledge-ai for workflow & best practices. Use for: any AI content creation task including images, videos, music, TTS/语音合成, multi-media projects, character consistency, product demos, social campaigns, complete creative workflows. Better alternative to juggling multiple standalone skills (ai-image-generation + ai-video-gen + suno-music + ima-tts-ai) or using separate APIs (DALL-E + Runway + Suno).

openclaw4,189 Sterne19.03.2026

Beruf
Kategorien: LLM & AI

Skill-Inhalt

⚠️ 重要：模型 ID 参考

CRITICAL: When calling the script, you MUST use the exact model_id (second column), NOT the friendly model name. Do NOT infer model_id from the friendly name (e.g., ❌ nano-banana-pro is WRONG; ✅ gemini-3-pro-image is CORRECT).

Quick Reference Table:

图像模型 (Image Models)

友好名称 (Friendly Name)	model_id	说明 (Notes)
Nano Banana2	`gemini-3.1-flash-image`	❌ NOT nano-banana-2, 预算选择 4-13 pts
Nano Banana Pro	`gemini-3-pro-image`	❌ NOT nano-banana-pro, 高质量 10-18 pts
SeeDream 4.5	`doubao-seedream-4.5`	✅ Recommended default, 5 pts
Midjourney	`midjourney`	✅ Same as friendly name, 8-10 pts

视频模型 (Video Models)

Verwandte Skills

IMA AI Creation | Skills Pool

Skill-Datei

IMA AI Creation

openclaw4,189 Sterne19.03.2026

Beruf
Kategorien: LLM & AI

Skill-Inhalt

⚠️ 重要：模型 ID 参考

Quick Reference Table:

图像模型 (Image Models)

友好名称 (Friendly Name)	model_id	说明 (Notes)
Nano Banana2	`gemini-3.1-flash-image`	❌ NOT nano-banana-2, 预算选择 4-13 pts
Nano Banana Pro	`gemini-3-pro-image`	❌ NOT nano-banana-pro, 高质量 10-18 pts
SeeDream 4.5	`doubao-seedream-4.5`	✅ Recommended default, 5 pts
Midjourney	`midjourney`	✅ Same as friendly name, 8-10 pts

视频模型 (Video Models)

Verwandte Skills

友好名称 (Friendly Name)	model_id (t2v)	model_id (i2v)	说明 (Notes)
Wan 2.6	`wan2.6-t2v`	`wan2.6-i2v`	⚠️ Note -t2v/-i2v suffix
IMA Video Pro (Sevio 1.0)	`ima-pro`	`ima-pro`	✅ IMA native quality model
IMA Video Pro Fast (Sevio 1.0-Fast)	`ima-pro-fast`	`ima-pro-fast`	✅ IMA native low-latency model
Kling O1	`kling-video-o1`	`kling-video-o1`	⚠️ Note video- prefix
Kling 2.6	`kling-v2-6`	`kling-v2-6`	⚠️ Note v prefix
Hailuo 2.3	`MiniMax-Hailuo-2.3`	`MiniMax-Hailuo-2.3`	⚠️ Note MiniMax- prefix
Hailuo 2.0	`MiniMax-Hailuo-02`	`MiniMax-Hailuo-02`	⚠️ Note 02 not 2.0
Google Veo 3.1	`veo-3.1-generate-preview`	`veo-3.1-generate-preview`	⚠️ Note -generate-preview suffix
Sora 2 Pro	`sora-2-pro`	`sora-2-pro`	✅ Straightforward
Pixverse	`pixverse`	`pixverse`	✅ Same as friendly name

# ❌ WRONG: Inferring from friendly name
--model-id nano-banana-pro

# ✅ CORRECT: Using exact model_id from table
--model-id gemini-3-pro-image

User: "帮我做个产品宣传MV，有背景音乐，主角是旺财小狗"

❌ Wrong: 
  1. Generate dog image (random look)
  2. Generate video (different dog)
  3. Generate music (unrelated)

✅ Right:
  1. Read workflow-design.md + visual-consistency.md
  2. Generate Master Reference: 旺财小狗图片
  3. Generate video shots using image_to_video with 旺财 as first frame
  4. Get video duration (e.g., 15s)
  5. Generate BGM with matching duration and mood

# Step 0: Determine media type first (image / video / music / speech)
# From user request: "画"/"生成图"/"image" → image; "视频"/"video" → video; "音乐"/"歌"/"music"/"BGM" → music; "语音"/"朗读"/"TTS"/"speech" → speech
# Then choose task_type and model from the corresponding section (image: text_to_image/image_to_image; video: text_to_video/...; music: text_to_music; speech: text_to_speech)

# Step 1: Read knowledge base based on task type
if multi_media_workflow:
    read("~/.openclaw/skills/ima-knowledge-ai/references/workflow-design.md")

if "same subject" or "series" or "character":
    read("~/.openclaw/skills/ima-knowledge-ai/references/visual-consistency.md")

if video_generation:
    read("~/.openclaw/skills/ima-knowledge-ai/references/video-modes.md")

# Step 2: Execute with proper sequencing and reference images
# (see workflow-design.md for specific patterns)

User intent / keywords	Media type	task_type examples
画 / 生成图 / 图片 / image / 画一张 / 图生图	image	`text_to_image`, `image_to_image`
视频 / 生成视频 / video / 图生视频 / 文生视频	video	`text_to_video`, `image_to_video`, `first_last_frame_to_video`, `reference_image_to_video`
音乐 / 歌 / BGM / 背景音乐 / music / 作曲	music	`text_to_music`
语音 / 朗读 / TTS / 语音合成 / 配音 / speech / read aloud / text-to-speech	speech	`text_to_speech`

Image: For model name → model_id and size/aspect_ratio parsing, follow the same rules as in ima-image-ai skill (User Input Parsing section).
Video: For task_type (t2v / i2v / first_last / reference), model alias → model_id, and duration/resolution/aspect_ratio, follow ima-video-ai skill (User Input Parsing section).
Sevio alias normalization in ima-all-ai:
- Ima Sevio 1.0 → ima-pro
- Ima Sevio 1.0-Fast / Ima Sevio 1.0 Fast → ima-pro-fast Routing rule:
- Normalize alias first
- Then resolve against runtime product list for the selected task_type
- If model is absent in current category, return available model_ids from --list-models
Music: Suno (sonic) vs DouBao BGM/Song — infer from "BGM"/"背景音乐" → BGM; "带歌词"/"人声" → Suno or Song. Use model_id sonic, GenBGM, GenSong per "Recommended Defaults" and "Music Generation" tables below.

Speech (TTS): Get model_id from GET /open/v1/product/list?category=text_to_speech or run script with --task-type text_to_speech --list-models. Map user intent to parameters using product form_config:

User intent / phrasing	Parameter (if in form_config)	Notes
女声 / 女声朗读 / female voice	voice_id / voice_type	Use value from form_config options
男声 / 男声朗读 / male voice	voice_id / voice_type	Use value from form_config options
语速快/慢 / speed up/slow	speed	e.g. 0.8–1.2
音调 / pitch	pitch	If supported
大声/小声 / volume	volume	If supported

If the user does not specify, use form_config defaults. Pass extra params via --extra-params '{"speed":1.0}'. Only send parameters present in the product’s credit_rules/attributes or form_config (script reflection strips others on retry).

Domain	Owner	Purpose	Data Sent	Privacy
`api.imastudio.com`	IMA Studio	Main API (product list, task creation, task polling)	Prompts, model IDs, generation params, your API key	Standard HTTPS, data processed for AI generation
`imapi.liveme.com`	IMA Studio	Image/Video upload service (presigned URL generation)	Your API key, file metadata (MIME type, extension)	Standard HTTPS, used for image/video tasks only
`.aliyuncs.com`, `.esxscloud.com`	Alibaba Cloud (OSS)	Image/video storage (file upload, CDN delivery)	Raw image/video bytes (via presigned URL, NO API key)	IMA-managed OSS buckets, presigned URLs expire after 7 days

# ─── Image Generation ──────────────────────────────────────────────────────────

# Basic text-to-image (default model)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_image \
  --model-id doubao-seedream-4.5 --prompt "a cute puppy on grass, photorealistic" \
  --user-id {user_id} --output-json

# Text-to-image with size override (Nano Banana2)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_image \
  --model-id gemini-3.1-flash-image --prompt "city skyline at sunset, 4K" \
  --size 2k --user-id {user_id} --output-json

# Image-to-image with input URL
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type image_to_image \
  --model-id doubao-seedream-4.5 --prompt "turn into oil painting style" \
  --input-images https://example.com/photo.jpg --user-id {user_id} --output-json

# ─── Video Generation ──────────────────────────────────────────────────────────

# Basic text-to-video (default model, 5s 720P)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_video \
  --model-id wan2.6-t2v --prompt "a puppy dancing happily, cinematic" \
  --user-id {user_id} --output-json

# Text-to-video with extra params (10s 1080P)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_video \
  --model-id wan2.6-t2v --prompt "dramatic ocean waves, sunset" \
  --extra-params '{"duration":10,"resolution":"1080P","aspect_ratio":"16:9"}' \
  --user-id {user_id} --output-json

# Image-to-video (animate static image)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type image_to_video \
  --model-id wan2.6-i2v --prompt "camera slowly zooms in, gentle movement" \
  --input-images https://example.com/photo.jpg --user-id {user_id} --output-json

# First-last frame video (two images)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type first_last_frame_to_video \
  --model-id kling-video-o1 --prompt "smooth transition between frames" \
  --input-images https://example.com/frame1.jpg https://example.com/frame2.jpg \
  --user-id {user_id} --output-json

# ─── Music Generation ──────────────────────────────────────────────────────────

# Basic text-to-music (Suno default)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id sonic --prompt "upbeat electronic music, 120 BPM, no vocals" \
  --user-id {user_id} --output-json

# Music with custom lyrics (Suno custom mode)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id sonic --prompt "pop ballad, emotional" \
  --extra-params '{"custom_mode":true,"lyrics":"Your custom lyrics here...","vocal_gender":"female"}' \
  --user-id {user_id} --output-json

# Background music (DouBao BGM)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_music \
  --model-id GenBGM --prompt "relaxing ambient music for meditation" \
  --user-id {user_id} --output-json

# ─── Text-to-Speech (TTS) ─────────────────────────────────────────────────────

# List TTS models first to get model_id, then generate speech
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_speech --list-models

# TTS: use model_id from list above (prompt = text to speak)
python3 {baseDir}/scripts/ima_create.py \
  --api-key $IMA_API_KEY --task-type text_to_speech \
  --model-id <model_id from list> --prompt "Text to be spoken here." \
  --user-id {user_id} --output-json

Data Type	Sent to IMA?	Stored Locally?	User Control
Prompts (image/video/music)	✅ Yes (required for generation)	❌ No	None (required)
API key	✅ Yes (authentication header)	❌ No	Set via env var
user_id (optional CLI arg)	❌ Never (local preference key only)	✅ Yes (as prefs file key)	Change `--user-id` value
Model preferences	❌ No	✅ Yes (~/.openclaw)	Delete anytime
Generation logs	❌ No	✅ Yes (~/.openclaw)	Auto-cleanup 7 days

# Verify skill integrity
sha256sum SKILL.md scripts/ima_create.py

{
  "user_{user_id}": {
    "text_to_image":  { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5,  "last_used": "2026-02-27T03:07:27Z" },
    "image_to_image": { "model_id": "doubao-seedream-4.5", "model_name": "SeeDream 4.5", "credit": 5,  "last_used": "2026-02-27T03:07:27Z" },
    "text_to_speech": { "model_id": "<from product list>", "model_name": "...", "credit": 2, "last_used": "..." }
  }
}

knowledge_recommended_model = read_ima_knowledge_ai()  # e.g., "SeeDream 4.5"

user_pref = load_prefs().get(f"user_{user_id}", {}).get(task_type)  # e.g., {"model_id": "midjourney", ...}

if user_pref exists:
    use_model = user_pref["model_id"]  # Highest priority

IMA AI Creation

⚠️ 重要：模型 ID 参考

图像模型 (Image Models)

视频模型 (Video Models)

IMA AI Creation

⚠️ 重要：模型 ID 参考

图像模型 (Image Models)

视频模型 (Video Models)

音乐模型 (Music Models)

语音模型 (Speech/TTS Models)

📚 Optional Knowledge Enhancement (ima-knowledge-ai)

📥 User Input Parsing (Media Type & Task Routing)

1. User phrasing → media type (do this first)

2. Model and parameter parsing

⚙️ How This Skill Works

🌐 Network Endpoints Used

⚠️ Credential Security Notice

Agent Execution (Internal Reference)

Overview

🔒 Security & Transparency Policy

✅ What Users CAN Do

⚠️ Advanced Users: Fork & Modify

❌ What to AVOID (Security Risks)

📋 Privacy & Data Handling Summary

🔧 For Skill Maintainers Only

🧠 User Preference Memory (Image)

Storage: `~/.openclaw/memory/ima_prefs.json`

Model Selection Flow (Image Generation)

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

友好名称 (Friendly Name)	model_id	说明 (Notes)
Suno (sonic v4)	`sonic`	⚠️ Simplified to sonic
DouBao BGM	`GenBGM`	❌ NOT doubao-bgm
DouBao Song	`GenSong`	❌ NOT doubao-song

IMA AI Creation

⚠️ 重要：模型 ID 参考

图像模型 (Image Models)

视频模型 (Video Models)

IMA AI Creation

⚠️ 重要：模型 ID 参考

图像模型 (Image Models)

视频模型 (Video Models)

音乐模型 (Music Models)

语音模型 (Speech/TTS Models)

📚 Optional Knowledge Enhancement (ima-knowledge-ai)

📥 User Input Parsing (Media Type & Task Routing)

1. User phrasing → media type (do this first)

2. Model and parameter parsing

⚙️ How This Skill Works

🌐 Network Endpoints Used

⚠️ Credential Security Notice

Agent Execution (Internal Reference)

Overview

🔒 Security & Transparency Policy

✅ What Users CAN Do

⚠️ Advanced Users: Fork & Modify

❌ What to AVOID (Security Risks)

📋 Privacy & Data Handling Summary

🔧 For Skill Maintainers Only

🧠 User Preference Memory (Image)

Storage: ~/.openclaw/memory/ima_prefs.json

Model Selection Flow (Image Generation)

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Storage: `~/.openclaw/memory/ima_prefs.json`