When to Use

User wants to generate original AI music from a prompt
User wants to create a cover from reference audio
User says "音乐", "music", "生成音乐", "generate music", "翻唱", "cover", "作曲", "compose", "create a song", or "做一首歌"

When NOT to Use

User wants text-to-speech reading (use /speech)
User wants a podcast discussion (use /podcast)
User wants an explainer video with narration (use /explainer)
User wants to transcribe audio to text (use /asr)

Purpose

Generate original AI music from text prompts, or create cover versions from reference audio. Two modes:

Generate (original): Create a new song from a text prompt, with optional style, title, and instrumental-only options.
Cover: Transform a reference audio file into a new version, with optional style modifications.

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any CLI command until the user has explicitly confirmed. </HARD-GATE>

Music

Music

When to Use

When NOT to Use

Purpose

Hard Constraints

Step -1: CLI Auth Check

Step 0: Config Setup

Setup Flow (user-initiated reconfigure only)

Interaction Flow

Step 1: Mode

Step 2a: Prompt (generate mode)

Step 2b: Reference Audio (cover mode)

Step 3: Style (optional)

Step 4: Title (optional)

Step 5: Instrumental

Step 6: Confirm & Generate

Workflow

After Successful Generation

Resources

Composability

Examples

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api