Name: Media MusicGen
Author: damionrashford

Media MusicGen | Skills Pool

Task	Recommended	License	Commercial use?	Vocals?
Any, lowest friction, fully permissive	Riffusion	MIT	Yes, unconditionally	No
Full song with vocals, best 2025 quality	YuE	Apache 2.0	Yes, unconditionally	Yes
Highest audio quality, instrumental	Stable Audio Open	Stability AI Community License	Yes, up to $1M annual revenue	No

# Riffusion (MIT, diffusers-based)
pip install diffusers transformers torch accelerate soundfile

# YuE (Apache 2.0) — HuggingFace transformers based
pip install transformers torch soundfile
# YuE models: m-a-p/YuE-s1-7B-anneal-en-cot (English, chain-of-thought)
#             m-a-p/YuE-s1-7B-anneal-zh-cot (Mandarin)

# Stable Audio Open (Stability Community License)
pip install stable-audio-tools soundfile
# (HuggingFace: stabilityai/stable-audio-open-1.0)

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install riffusion
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install yue
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install stable-audio-open

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py check

# Riffusion — fast, MIT, spectrogram diffusion
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model riffusion \
    --prompt "dreamy lo-fi hip-hop beat with mellow piano" \
    --duration 10 \
    --out beat.wav

# YuE — full song with vocals
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model yue \
    --prompt "emotional indie rock, male vocalist, 80 bpm, acoustic guitar, simple drums" \
    --duration 90 \
    --out song.wav

# Stable Audio Open — 44.1 kHz stereo, instrumental
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model stable-audio-open \
    --prompt "ambient cinematic strings, slow rise, film score" \
    --duration 47 \
    --out cue.wav

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py continue \
    --in sketch.wav \
    --model stable-audio-open \
    --prompt "add energy, build into chorus" \
    --duration 20 \
    --out extended.wav

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py stems \
    --in song.wav \
    --out-dir stems/
# -> stems/vocals.wav, stems/drums.wav, stems/bass.wav, stems/other.wav

License boundary with Stable Audio Open: the Stability AI Community License permits commercial use only up to $1M in cumulative revenue. Above that you need a commercial agreement with Stability. Riffusion and YuE have no such cap. Flag this early to the user. See references/LICENSES.md.
MusicGen / AudioCraft / Suno / Udio are EXCLUDED from this skill. MusicGen's code is MIT but weights are CC-BY-NC. Do not recommend or install them here.
Prompt engineering dominates output quality. Bad results are almost always prompt problems. Try: genre + instrumentation + tempo + mood + production style (e.g. "80 bpm lo-fi hip-hop beat with mellow rhodes piano and vinyl crackle, chill sunset vibe").
Riffusion generates via spectrograms. Under the hood it's Stable Diffusion v1.5 fine-tuned to produce mel-spectrograms, then Griffin-Lim or WaveNet-style decoder. This is why 5 s is the native window.
YuE produces vocals but expects structured prompts. Its best results are with section labels: "[Verse 1]\n...lyrics...\n[Chorus]\n...lyrics...". See the YuE HuggingFace model card.
Stable Audio Open output is 44.1 kHz stereo FP32. Save directly; no resampling needed for most mastering workflows.
GPU is practically required for YuE (7B params). Riffusion runs on CPU slowly (~2 min for 5 s); Stable Audio Open needs GPU for any reasonable speed.
Durations are approximate. All three models generate fixed-length latents; actual output length can be ±10% of the requested duration.
Seeds: pass --seed N for reproducibility. Default is a random seed each run.
Model downloads are 2-15 GB each. First run caches into ~/.cache/huggingface/.
Stereo vs mono: Riffusion is mono-only. YuE is mono. Stable Audio Open is stereo. Upmix to stereo in ffmpeg if needed: -ac 2.
No copyright laundering: the user is responsible for confirming generated music doesn't accidentally reproduce a copyrighted melody. Highly specific artist-imitation prompts raise the risk — use genre/mood descriptors instead of artist names.
Avoid --chain-of-thought in prompts for non-YuE models. Prompts that reference "Verse 1 / Chorus" etc. confuse Riffusion and Stable Audio Open.

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model riffusion \
    --prompt "upbeat corporate bright synth pad, inspirational" \
    --duration 10 \
    --out intro_music.wav

ffmpeg -i intro.mp4 -i intro_music.wav \
    -map 0:v -map 1:a -c:v copy -shortest intro_scored.mp4

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model yue \
    --prompt "[Verse 1] acoustic guitar picking, soft male vocals about a road trip. [Chorus] full band, big drums, uplifting harmonies, 80 bpm. [Verse 2] same as verse 1 with harmony vocal." \
    --duration 90 \
    --seed 42 \
    --out indie_song.wav

# Generate
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
    --model stable-audio-open \
    --prompt "ambient cinematic strings, slow build, film score, Johann Johannsson style" \
    --duration 45 \
    --out cue.wav

# Split stems for further work
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py stems --in cue.wav --out-dir cue_stems/
# -> cue_stems/{vocals,drums,bass,other}.wav (other will dominate for instrumental)

uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py continue \
    --in sketch_8bars.wav \
    --model stable-audio-open \
    --prompt "build into a climax, more energy, bigger drums" \
    --duration 15 \
    --out sketch_extended.wav

Media MusicGen

Quick start

When to use

Media MusicGen

Quick start

When to use

Step 1 — Pick a model

Step 2 — Install

Step 3 — Generate

Step 4 — Continue an existing audio idea

Step 5 — Split output into stems (vocals / drums / bass / other)

Step 6 — Post-process / mux

Gotchas

Examples

Example 1: 10-second background music for a video intro

Example 2: Full 90-second indie rock song with vocals via YuE

Example 3: Stable Audio Open cinematic cue + stem split for DAW mix

Example 4: Extend a sketch idea

Troubleshooting

Error: `ModuleNotFoundError: No module named 'diffusers'`

Error: `CUDA out of memory` on YuE

Output clips at 5 s for longer prompts

YuE produces melody but no vocals

Generated music sounds like a specific copyrighted song

Stable Audio Open refuses to run locally

Reference docs

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Media MusicGen

Quick start

When to use

Media MusicGen

Quick start

When to use

Step 1 — Pick a model

Step 2 — Install

Step 3 — Generate

Step 4 — Continue an existing audio idea

Step 5 — Split output into stems (vocals / drums / bass / other)

Step 6 — Post-process / mux

Gotchas

Examples

Example 1: 10-second background music for a video intro

Example 2: Full 90-second indie rock song with vocals via YuE

Example 3: Stable Audio Open cinematic cue + stem split for DAW mix

Example 4: Extend a sketch idea

Troubleshooting

Error: ModuleNotFoundError: No module named 'diffusers'

Error: CUDA out of memory on YuE

Output clips at 5 s for longer prompts

YuE produces melody but no vocals

Generated music sounds like a specific copyrighted song

Stable Audio Open refuses to run locally

Reference docs

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Error: `ModuleNotFoundError: No module named 'diffusers'`

Error: `CUDA out of memory` on YuE