Open-source AI music generation with permissive-license models: Riffusion (MIT, spectrogram-to-audio music from text), YuE (Apache 2.0, Chinese Academy 2025 full-song generation with vocals), Stable Audio Open (Stability community license, commercial up to 1M ARR). Text-to-music, genre-conditioned generation, full song with structure, continuation from existing audio, stem splits via media-demucs. Use when the user asks to generate music from text, create AI songs, make background music for video, generate a jingle, produce royalty-free music with open-source models, make a full song with vocals, or continue an existing musical idea.
Context: $ARGUMENTS
scripts/musicgen.py generate --model riffusion --prompt "lo-fi hip-hop beat" --duration 10 --out out.wav → Step 3scripts/musicgen.py generate --model yue --prompt "emotional indie rock, male vocalist, 80 bpm" --duration 90 --out song.wav → Step 3scripts/musicgen.py generate --model stable-audio-open --prompt "ambient cinematic strings" --duration 47 --out out.wav → Step 3 (read LICENSES.md first)scripts/musicgen.py continue --in sketch.wav --prompt "build into chorus" --duration 20 --out extended.wav → Step 4scripts/musicgen.py stems --in out.wav --out-dir stems/ → Step 5 (hands off to media-demucs)Do NOT use this skill for:
| Task | Recommended | License | Commercial use? | Vocals? |
|---|---|---|---|---|
| Any, lowest friction, fully permissive | Riffusion | MIT | Yes, unconditionally | No |
| Full song with vocals, best 2025 quality | YuE | Apache 2.0 | Yes, unconditionally | Yes |
| Highest audio quality, instrumental | Stable Audio Open | Stability AI Community License | Yes, up to $1M annual revenue | No |
Full license audit (including excluded models and Stable Audio's $1M ARR cap) in references/LICENSES.md.
# Riffusion (MIT, diffusers-based)
pip install diffusers transformers torch accelerate soundfile
# YuE (Apache 2.0) — HuggingFace transformers based
pip install transformers torch soundfile
# YuE models: m-a-p/YuE-s1-7B-anneal-en-cot (English, chain-of-thought)
# m-a-p/YuE-s1-7B-anneal-zh-cot (Mandarin)
# Stable Audio Open (Stability Community License)
pip install stable-audio-tools soundfile
# (HuggingFace: stabilityai/stable-audio-open-1.0)
Or let the script help:
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install riffusion
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install yue
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py install stable-audio-open
Check what's available:
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py check
# Riffusion — fast, MIT, spectrogram diffusion
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model riffusion \
--prompt "dreamy lo-fi hip-hop beat with mellow piano" \
--duration 10 \
--out beat.wav
# YuE — full song with vocals
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model yue \
--prompt "emotional indie rock, male vocalist, 80 bpm, acoustic guitar, simple drums" \
--duration 90 \
--out song.wav
# Stable Audio Open — 44.1 kHz stereo, instrumental
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model stable-audio-open \
--prompt "ambient cinematic strings, slow rise, film score" \
--duration 47 \
--out cue.wav
Per-model limits:
Feed an existing clip as seed; the model continues in style.
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py continue \
--in sketch.wav \
--model stable-audio-open \
--prompt "add energy, build into chorus" \
--duration 20 \
--out extended.wav
Only Stable Audio Open and Riffusion support continuation reliably; YuE's continuation API is less stable.
Delegates to the media-demucs skill.
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py stems \
--in song.wav \
--out-dir stems/
# -> stems/vocals.wav, stems/drums.wav, stems/bass.wav, stems/other.wav
media-ffmpeg-normalize (Spotify -14 LUFS, YouTube -14 LUFS, broadcast -23 LUFS).ffmpeg -i song.wav -c:a libmp3lame -b:a 320k song.mp3ffmpeg -i video.mp4 -i music.wav -map 0:v -map 1:a -c:v copy -shortest out.mp4ffmpeg-audio-filter skill (afade + concat demuxer).references/LICENSES.md."[Verse 1]\n...lyrics...\n[Chorus]\n...lyrics...". See the YuE HuggingFace model card.--seed N for reproducibility. Default is a random seed each run.~/.cache/huggingface/.-ac 2.--chain-of-thought in prompts for non-YuE models. Prompts that reference "Verse 1 / Chorus" etc. confuse Riffusion and Stable Audio Open.uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model riffusion \
--prompt "upbeat corporate bright synth pad, inspirational" \
--duration 10 \
--out intro_music.wav
ffmpeg -i intro.mp4 -i intro_music.wav \
-map 0:v -map 1:a -c:v copy -shortest intro_scored.mp4
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model yue \
--prompt "[Verse 1] acoustic guitar picking, soft male vocals about a road trip. [Chorus] full band, big drums, uplifting harmonies, 80 bpm. [Verse 2] same as verse 1 with harmony vocal." \
--duration 90 \
--seed 42 \
--out indie_song.wav
# Generate
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py generate \
--model stable-audio-open \
--prompt "ambient cinematic strings, slow build, film score, Johann Johannsson style" \
--duration 45 \
--out cue.wav
# Split stems for further work
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py stems --in cue.wav --out-dir cue_stems/
# -> cue_stems/{vocals,drums,bass,other}.wav (other will dominate for instrumental)
uv run ${CLAUDE_SKILL_DIR}/scripts/musicgen.py continue \
--in sketch_8bars.wav \
--model stable-audio-open \
--prompt "build into a climax, more energy, bigger drums" \
--duration 15 \
--out sketch_extended.wav
ModuleNotFoundError: No module named 'diffusers'Cause: Riffusion backend isn't installed.
Solution: pip install diffusers transformers torch accelerate soundfile or scripts/musicgen.py install riffusion.
CUDA out of memory on YuECause: YuE 7B needs ~16 GB VRAM in fp16.
Solution: load in int4 / bnb-4bit (--load-in-4bit flag; requires bitsandbytes), shorten duration, or generate on CPU (slow — 30+ min).
Cause: Riffusion's native window is 5 s; longer requires the crossfade-loop which the script normally handles automatically.
Solution: confirm --duration exceeds 5 and that the script version is up to date. Try --model stable-audio-open for native long-form.
Cause: prompt lacks section markers or vocal cues.
Solution: use [Verse 1] / [Chorus] tags and mention vocals explicitly ("male vocalist", "female vocalist", "harmonies").
Cause: overly specific artist / song reference in the prompt. Solution: describe the style instead of the source (e.g. "melancholic 90s grunge with distorted guitars" rather than "a Nirvana song").
Cause: Stability CC access-gated on HuggingFace.
Solution: accept the model license on https://huggingface.co/stabilityai/stable-audio-open-1.0 and huggingface-cli login.
references/LICENSES.md — strict confirmation of commercial-safety, including Stable Audio Open's $1M ARR cap and explicit exclusions of MusicGen / Suno / Udio / Jukebox.