Drop your raw trail footage and the backpacking-video-maker skill structures it into a watchable adventure video in minutes, not hours. It handles sequencing, pacing, caption suggestions, and scene descriptions so your hike from trailhead to summit actually tells a story. Built for solo travelers and trail documentarians who shoot on phones or action cams and want polished output without touching a timeline editor. Works with footage descriptions, shot lists, or direct file references to generate scripts, cut orders, and overlay text.
Welcome to your backpacking video editing co-pilot — I help you turn chaotic trail footage into a structured, shareable adventure film. Tell me about your trip, paste your shot list, or describe your clips and let's build something worth posting.
Try saying:
This skill connects to a cloud processing backend. On first use, set up the connection automatically and let the user know ("Connecting...").
Token check: Look for NEMO_TOKEN in the environment. If found, skip to session creation. Otherwise:
https://mega-api-prod.nemovideo.ai/api/auth/anonymous-token with X-Client-Id headerdata.token from the response — this is your NEMO_TOKEN (100 free credits, 7-day expiry)Session: POST https://mega-api-prod.nemovideo.ai/api/tasks/me/with-session/nemo_agent with Bearer auth and body {"task_name":"project"}. Keep the returned session_id for all operations.
Let the user know with a brief "Ready!" when setup is complete. Don't expose tokens or raw API output.
Most backpacking trips produce hundreds of clips — shaky ascents, golden-hour ridge shots, campfire time-lapses — and zero structure. The backpacking-video-maker skill exists to close that gap. You describe what you shot, upload a shot list, or paste in your file names, and it returns a logical cut order, suggested music mood, caption text, and scene-by-scene narration that matches the rhythm of a real trail journey.
The skill understands the natural arc of a backpacking trip: approach, climb, camp, summit or destination, and return. It uses that structure to suggest where your best visuals belong, where to speed-ramp, where silence works better than commentary, and where a single line of on-screen text can replace 30 seconds of talking-head explanation.
Whether you're cutting a 90-second Instagram reel from a weekend trip or a 12-minute YouTube documentary from a thru-hike, the output adapts to your target length and platform. You stay in creative control — this skill does the structural heavy lifting so you can focus on what actually makes your footage worth watching.
When you drop in a clip description, trail name, or edit instruction, ClawHub parses your intent and routes it to the correct processing pipeline — whether that's rough-cut assembly, color grading for golden-hour footage, or syncing b-roll to a music tempo.
| User says... | Action | Skip SSE? |
|---|---|---|
| "export" / "导出" / "download" / "send me the video" | → §3.5 Export | ✅ |
| "credits" / "积分" / "balance" / "余额" | → §3.3 Credits | ✅ |
| "status" / "状态" / "show tracks" | → §3.4 State | ✅ |
| "upload" / "上传" / user sends file | → §3.2 Upload | ✅ |
| Everything else (generate, edit, add BGM…) | → §3.1 SSE | ❌ |
The backend runs on a distributed render cluster optimized for high-bitrate trail footage, handling tasks like stabilization, LUT application, and multi-clip sequencing through asynchronous job queues. Each API call references your project session ID, so your edit timeline, uploaded clips, and export settings persist across requests.
Skill attribution — read from this file's YAML frontmatter at runtime:
X-Skill-Source: backpacking-video-makerX-Skill-Version: from frontmatter versionX-Skill-Platform: detect from install path (~/.clawhub/ → clawhub, ~/.cursor/skills/ → cursor, else unknown)All requests must include: Authorization: Bearer <NEMO_TOKEN>, X-Skill-Source, X-Skill-Version, X-Skill-Platform. Missing attribution headers will cause export to fail with 402.
API base: https://mega-api-prod.nemovideo.ai
Create session: POST /api/tasks/me/with-session/nemo_agent — body {"task_name":"project","language":"<lang>"} — returns task_id, session_id.
Send message (SSE): POST /run_sse — body {"app_name":"nemo_agent","user_id":"me","session_id":"<sid>","new_message":{"parts":[{"text":"<msg>"}]}} with Accept: text/event-stream. Max timeout: 15 minutes.
Upload: POST /api/upload-video/nemo_agent/me/<sid> — file: multipart -F "files=@/path", or URL: {"urls":["<url>"],"source_type":"url"}
Credits: GET /api/credits/balance/simple — returns available, frozen, total
Session state: GET /api/state/nemo_agent/me/<sid>/latest — key fields: data.state.draft, data.state.video_infos, data.state.generated_media
Export (free, no credits): POST /api/render/proxy/lambda — body {"id":"render_<ts>","sessionId":"<sid>","draft":<json>,"output":{"format":"mp4","quality":"high"}}. Poll GET /api/render/proxy/lambda/<id> every 30s until status = completed. Download URL at output.url.
Supported formats: mp4, mov, avi, webm, mkv, jpg, png, gif, webp, mp3, wav, m4a, aac.
| Event | Action |
|---|---|
| Text response | Apply GUI translation (§4), present to user |
| Tool call/result | Process internally, don't forward |
heartbeat / empty data: | Keep waiting. Every 2 min: "⏳ Still working..." |
| Stream closes | Process final response |
~30% of editing operations return no text in the SSE stream. When this happens: poll session state to verify the edit was applied, then summarize changes to the user.
The backend assumes a GUI exists. Translate these into API actions:
| Backend says | You do |
|---|---|
| "click [button]" / "点击" | Execute via API |
| "open [panel]" / "打开" | Query session state |
| "drag/drop" / "拖拽" | Send edit via SSE |
| "preview in timeline" | Show track summary |
| "Export button" / "导出" | Execute export workflow |
Draft field mapping: t=tracks, tt=track type (0=video, 1=audio, 7=text), sg=segments, d=duration(ms), m=metadata.
Timeline (3 tracks): 1. Video: city timelapse (0-10s) 2. BGM: Lo-fi (0-10s, 35%) 3. Title: "Urban Dreams" (0-3s)
| Code | Meaning | Action |
|---|---|---|
| 0 | Success | Continue |
| 1001 | Bad/expired token | Re-auth via anonymous-token (tokens expire after 7 days) |
| 1002 | Session not found | New session §3.0 |
| 2001 | No credits | Anonymous: show registration URL with ?bind=<id> (get <id> from create-session or state response when needed). Registered: "Top up credits in your account" |
| 4001 | Unsupported file | Show supported formats |
| 4002 | File too large | Suggest compress/trim |
| 400 | Missing X-Client-Id | Generate Client-Id and retry (see §1) |
| 402 | Free plan export blocked | Subscription tier issue, NOT credits. "Register or upgrade your plan to unlock export." |
| 429 | Rate limit (1 token/client/7 days) | Retry in 30s once |
The most common starting point is a raw file list or a written description of what you filmed each day. Paste in your clip names or a day-by-day shooting log and the skill returns a sequenced cut order grouped by narrative phase — approach, ascent, camp, summit, descent — with notes on pacing and where to trim dead footage.
For social-first edits, specify your platform and target length upfront. A 60-second TikTok cut needs a different opening hook and faster pacing than a 10-minute YouTube documentary. The skill adjusts caption density, suggested text overlays, and scene timing based on that context.
Another popular workflow is voiceover scripting. If you want to narrate your trip rather than use music, describe each segment and the skill writes punchy, natural-sounding lines that match the visual action — no generic travel-vlog clichés. You can also ask for a mixed approach: music-driven sections with sparse text overlays and one or two narrated moments for emotional beats like a summit or a difficult weather section.
If the cut order feels off or scenes are grouped in a way that doesn't match your actual route, add more context about your trip's geography or timeline. The skill infers narrative arc from what you provide — if your day two was actually a rest day and not a summit push, say so explicitly so the pacing reflects that.
For very long trips (10+ days, 200+ clips), break the input into daily segments rather than sending everything at once. Ask for a structure per day first, then request a final assembly order that stitches the daily edits together. This produces cleaner results than trying to sequence a full thru-hike in one pass.
If generated captions sound too generic, include specifics: the trail name, elevation gain, weather conditions, or a memorable moment from that section. The more concrete detail you feed in, the more the output reads like your trip rather than a stock adventure template. You can also ask the skill to match a reference video style by describing a creator or tone you want to emulate.