Name: Litmedia Skill
Author: litmedia-ai

Litmedia Skill | Skills Pool

scripts/

curl

Keep replies short — give the result or next step directly. If one sentence is enough, don't write three.
Use plain language — no API jargon, no terminal references, no mentions of environment variables, polling, JSON, scripts, or "auth flow". Speak as if the user has never seen a command line.
Never mention terminal details — do not reference command output, logs, exit codes, file paths, config files, or any technical internals. These mean nothing to the user.
Never ask the user to operate a browser popup — the user cannot see the agent's machine screen. When login is needed, the only correct action is to send the authorization link directly in the chat.
Always send the direct login link — extract URL: ... from auth.py login output and use the login template below. Never say "browser opened" or similar. If the URL is not found in the output, re-run auth.py login to get a new link. Never skip sending the link.
Wait for user confirmation after login — ask the user to reply "好了" / "done", then continue the task.
Handle account switching properly — when switching accounts, use auth.py accountswitch and remind the user to log out of their current LitMedia web account or log in with the new account on the website first. After the switch, wait for user confirmation before proceeding.
Explain errors simply — if a task fails, tell the user in one sentence what happened and ask if they want to retry. Never paste error messages or technical details.
Be result-oriented — after task completion, give the user the result (link, image, video) directly. Do not describe intermediate steps.
Always take the user's perspective — the user can only see the chat conversation, nothing else. Anything requiring user action (links, confirmations) must appear in the chat.
Do not tell the user to register separately — the authorization page includes both login and sign-up. New users can register directly on that page. Never say "go to litmedia.ai to register first".
Act directly, don't ask which method — when login is needed, just run auth.py login and send the link. Don't ask "which method do you prefer?" or present multiple options. The user asked you to do something — login is just an intermediate step, handle it.
Give time estimates for generation tasks — after submitting a task, tell the user the estimated wait time so they know what to expect. Use the estimates from the "Estimated Generation Time" table below.

Task Type	Model	Estimated Time
Video	Standard / Fast (Seedance 2.0)	~5–10 min
Video	All other video models (Kling, Sora, Veo, Vidu, etc.)	~3–5 min
Image	image models (Nano Banana, Seedream etc.)	~30s–1 min
Avatar	avatar4	~2–5 min (depends on script length)
Character Replace	`Kling V3.0`, `Seedance 2.0`, `Wan 2.2`	~3–5 min

安装完成，LitMedia Skill 已连接到你的智能助手。

复制下方链接到浏览器中登录，登录后将解锁以下能力：

<LOGIN_URL>

🎬 视频生成
文字转视频、图片转视频、参考视频生成，自动配音配乐。
视频模型：Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI 图片生成与编辑
文字生图、AI 修图、风格转换，最高支持 4K。
图片模型：Nano Banana · Nano Banana 2 · Nano Banana Pro · Seedream 4.0 · Seedream 4.5 · Seedream 5.0

🧑‍💼 口播数字人
上传一张照片 + 一段音频，自动生成真人口播视频，支持多语种。

✂️ 角色替换(动作模仿)
上传一张角色照片 + 动作视频，视频中的人物替换成图片中的角色或者照片中的人物会模仿视频中的动作

🎙️ 语音与配音
文字转语音、声音克隆，支持多语种配音输出。

登录完成后回我一句"好了"，我马上继续。

Installation complete. LitMedia Skill is now connected to your agent.

Copy the link below into your browser to sign in. After signing in, the following capabilities will be unlocked.

<LOGIN_URL>

🎬 Video Generation
Text-to-video, image-to-video, reference-based generation with auto sound & music.
Models: Seedance 2.0 · Sora 2 · Kling 3 · Veo 3.1 · Vidu Q3 · wan2.7

🖼️ AI Image Generation & Editing
Text-to-image, AI retouching, style transfer — up to 4K resolution.
Models: Nano Banana · Nano Banana 2 · Nano Banana Pro · Seedream 4.0 · Seedream 4.5 · Seedream 5.0

🧑‍💼 Talking Avatar
Upload a photo + audio to auto-generate presenter-style talking head videos.

✂️ Character Replace
Upload a character photo along with an action video. In the video, replace the characters with those from the picture or the characters in the picture will imitate the actions shown in the video.

🎙️ Voice & TTS
Text-to-speech, voice cloning, multilingual dubbing and narration.

Once you've signed in, just reply "done" and I'll continue right away.

pip install -r {baseDir}/scripts/requirements.txt

Decision tree:
  → New request?           use `run`
  → run timed out?         use `query --task-id <id>`
  → query timed out?       use `query --task-id <id> --timeout 1200`
  → task status=fail?      resubmit with `run`

python scripts/video_gen.py list-models --type <t2v|i2v|extend|anim|a2ls>

Module	Script	Reference	Description
Auth	`scripts/auth.py`	auth.md	OAuth 2.0 Device Flow — generate login link, wait for authorization, save credentials; supports account switching via `accountswitch` command
Avatar4	`scripts/avatar4.py`	avatar4.md	Talking avatar videos from a photo; `list-captions` for caption styles
Video Gen	`scripts/video_gen.py`	video_gen.md	Image-to-video, text-to-video, video extension, ai animation, lip sync
AI Image	`scripts/ai_image.py`	ai_image.md	Text-to-image and AI image editing (10+ models)
Character Replace	`scripts/video_mimic.py`	video_mimic.md	Character Replace in Videos with Scene Consistency using LitMedia Common Task APIs.
User	`scripts/user.py`	user.md	Credit balance and usage history

Dimension	Ask Yourself	Fallback
Output Type	Image? Video? Audio? Composite?	Must ask
Purpose	Marketing? Education? Social media? Personal?	General social media
Source Material	What does the user have? What's missing?	Must ask
Style / Tone	Professional? Casual? Playful? Authoritative?	Professional & friendly
Duration	How long should the output be?	5–15s for clips, 30–60s for avatar
Language	What language? Need captions?	Match user's language
Channel	Where will it be published?	General purpose

What does the user need?
│
├─ A person speaking to camera (talking head)?
│  → avatar4 or video_gen with native-audio models
│
├─ An image animated into a video clip?
│  → video_gen --type i2v
│
├─ A video generated purely from text?
│  → video_gen --type t2v
│
├─ A new video based on reference style (style transfer, editing)?
│  → video_gen --type anim
│
├─ Generate a new video based on the extended version of the original video
│  → video_gen --type extend
│
├─ A new video based on reference image and sound (Lip Sync)?
│  → video_gen --type a2ls
│
├─ An image generated from a text prompt?
│  → ai_image --type text2image
│
├─ An existing image edited / modified with AI?
│  → ai_image --type image_edit
│
├─ Replace the video characters with photo characters
│  → Character Replace
│
├─ Photo characters imitate the ations of the video characters
│  → Character Replace
│
├─ Browse available caption styles for avatar videos?
│  → avatar4 list-captions
│
├─ view user all results?
│  → user logs
│
└─ Outside current capabilities?
   → See Capability Boundaries below

User says...	Script & Type
"Make a talking avatar video with this photo and text"	`avatar4.py` (pass local image path directly)
"Generate a video with this photo and my audio recording"	`avatar4.py` (pass local image + audio paths)
"Animate this image / image-to-video"	`video_gen.py --type i2v` (pass local image path)
"Generate a video about..."	`video_gen.py --type t2v`
"Generate a new video referencing this image's style"	`video_gen.py --type anim`
"Extend the original videoo"	`video_gen.py --type extend`
"Generate a new video referencing image and sound"	`video_gen.py --type a2ls`
"Generate an image / text-to-image"	`ai_image.py --type text2image`
"Modify this image / change background"	`ai_image.py --type image_edit`
"Character Replace / Action imitation"	`video_mimic.py`
"What caption styles are available?"	`avatar4.py list-captions`
"View my creation history / check what was generated"	`user.py logs --type image` or `user.py logs --type video`
"Check how many credits I have left"	`user.py credit`

🎬 视频已生成完成

🔗视频地址：<VIDEO_URL>
• 时长：<DURATION>
• 画幅：<ASPECT_RATIO>
• 模型：<MODEL_NAME>
• 消耗：<COST> credits

不满意的话可以告诉我，我帮你调整后重新生成。

🖼️ 图片已生成完成

🔗 图片地址：<IMAGE_URL>
• 分辨率：<RESOLUTION>
• 模型：<MODEL_NAME>
• 消耗：<COST> credits

不满意的话可以告诉我，我帮你调整后重新生成。

🎬 Video generated

🔗 Video: <VIDEO_URL>
• Duration: <DURATION>
• Aspect ratio: <ASPECT_RATIO>
• Model: <MODEL_NAME>
• Cost: <COST> credits

View, edit, and download in the project.

Not happy with the result? Let me know and I'll adjust and regenerate.

🖼️ Image generated

🔗 Image: <IMAGE_URL>
• Resolution: <RESOLUTION>
• Model: <MODEL_NAME>
• Cost: <COST> credits

Not happy with the result? Let me know and I'll adjust and regenerate.

Capability	Status	Script
Photo avatar / talking head	Available	`scripts/avatar4.py`
Caption styles	Available	`scripts/avatar4.py list-captions`
Credit management	Available	`scripts/user.py`
Image-to-video (i2v)	Available	`scripts/video_gen.py --type i2v`
Text-to-video (t2v)	Available	`scripts/video_gen.py --type t2v`
Video Extension (extend)	Available	`scripts/video_gen.py --type extend`
Animation (anim)	Available	`scripts/video_gen.py --type anim`
Audio-to-Lip-sync (a2ls)	Available	`scripts/video_gen.py --type a2ls`
Text-to-image	Available	`scripts/ai_image.py --type text2image`
Image editing	Available	`scripts/ai_image.py --type image_edit`
Character Replace	Available	`scripts/video_mimic.py`
Creation history browsing	Available	`scripts/user.py logs --type image` or `scripts/user.py logs --type video`
Marketing video (m2v)	No module	Suggest litmedia.ai web UI

Status	Description
`init`	Task is queued, waiting to be processed
`working`	Task is actively being processed
`completed`	Task completed successfully
`failed`	Task failed

Litmedia Skill

LitMedia AI Skill

Execution Rule

Litmedia Skill

LitMedia AI Skill

Execution Rule

User-Facing Reply Rules

Prerequisites

Agent Workflow Rules

MANDATORY Pre-Execution Protocol

Step 1: Estimate Cost

Step 2: Validate Parameters

Modules

Creative Guide

Step 1 — Intent Analysis

Step 2 — Tool Selection

Step 3 — Simple vs Complex

Pre-Execution Protocol

Agent Behavior Protocol

During Execution

After Execution

Error Handling

Capability Boundaries

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api