generate text or images into AI-generated video clips with this kling-ai-video skill. Works with MP4, JPG, PNG, WebM files up to 200MB. content creators, marketers, social media managers use it for generating short video clips from images or text prompts — processing takes 1-3 minutes on cloud GPUs and you get 1080p MP4 files.
Got a text or images for me? Send it over and I'll handle the AI video generation. Or just describe what you need.
Try saying:
When a user first opens this skill, connect to the processing backend automatically. Briefly let them know (e.g. "Setting up...").
Authentication: Check if NEMO_TOKEN is set in the environment. If it is, skip to step 2.
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with header X-Client-Id set to that UUID. The response data.token is your NEMO_TOKEN — 100 free credits, valid 7 days.https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agentAuthorization: Bearer <token>Content-Type: application/json{"task_name":"project","language":"<detected>"}session_idKeep setup communication brief. Don't display raw API responses or token values to the user.
What does this do? It takes your text or images and runs AI video generation on a cloud backend. Nothing to install.
Real example: I threw in a product photo with a short text prompt, typed "generate a 5-second cinematic video clip from this image", and 1-3 minutes later had a clean MP4 file. Default output is 1080p.
Quick note: shorter prompts with clear motion descriptions produce more predictable results.
The skill checks your message against a few patterns and routes it to the right handler.
| User says... | Action | Skip SSE? |
|---|---|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
Your kling ai video requests hit a cloud rendering backend with GPU acceleration. Files get processed server-side so your machine doesn't need any special hardware.
Include Authorization: Bearer <NEMO_TOKEN> and all attribution headers on every request — omitting them triggers a 402 on export.
Headers are derived from this file's YAML frontmatter. X-Skill-Source is kling-ai-video, X-Skill-Version comes from the version field, and X-Skill-Platform is detected from the install path (~/.clawhub/ = clawhub, ~/.cursor/skills/ = cursor, otherwise unknown).
API base: https://mega-api-prod.nemovideo.ai
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}
Credits: GET /api/credits/balance/simple — returns available, frozen, total
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
| Code | Meaning | Action |
|---|---|---|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
The backend responds as if there's a visual interface. Map its instructions to API calls:
Text events go straight to the user (after GUI translation). Tool calls stay internal. Heartbeats and empty data: lines mean the backend is still working — show "⏳ Still working..." every 2 minutes.
About 30% of edit operations close the stream without any text. When that happens, poll /api/state to confirm the timeline changed, then tell the user what was updated.
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
How long does processing take? Depends on length — a 30-second clip finishes in 1-3 minutes, a 10-minute video might need 3-5 minutes.
What formats work? MP4, JPG, PNG, WebM on input. Output is always MP4.
Is there a file size limit? Yeah, 200MB. Compress or trim if you're over.
Do I need an account? Nope. You grab 100 free credits on first use, no signup.
From scratch: Describe what you want and the AI generates a draft. You refine from there.
Polish existing content: Upload your text or images, ask for specific changes — generate a 5-second cinematic video clip from this image, adjust colors, swap music. The backend handles rendering.
Export ready: Once you're happy, export at 1080p in MP4. File lands in your downloads.