Audio deconstruction and composition via Strudel live-coding. Decompose any audio into stems, extract samples, compose with the vocabulary, render offline to WAV/MP3.
⚠️ Legal Notice: This tool processes audio you provide. You are responsible for ensuring you have the rights to use the source material. The authors make no claims about fair use, copyright, or derivative works regarding your use of this tool with copyrighted material.
Compose, render, deconstruct, and remix music using code. Takes natural language prompts → writes Strudel patterns → renders offline through real Web Audio synthesis → posts audio or streams to Discord VC (via the OpenClaw gateway — no separate credentials needed). Can also reverse-engineer any audio track into stems, samples, and generative programs.
New here? Read docs/ONBOARDING.md for a ground-up introduction.
Rendering MUST run as a sub-agent or background process, never inline in your main session.
The offline renderer (chunked-render.mjs / offline-render-v2.mjs) runs a tight audio-processing loop that blocks the Node.js event loop. If you run it in your main OpenClaw session, it will kill the gateway after ~30 seconds (the heartbeat timeout).
✅ Correct: spawn a sub-agent or use background exec
❌ Wrong: run the renderer inline in your main conversation
Always do this:
# Background exec with timeout
exec background:true timeout:120 command:"node src/runtime/chunked-render.mjs src/compositions/my-track.js output/my-track.wav 20"
Or spawn a sub-agent:
sessions_spawn task:"Render strudel-music composition: node src/runtime/chunked-render.mjs ..."
This is the #1 way to break things. Don't skip this.
# 1. Setup
cd ~/.openclaw/workspace/strudel-music
npm run setup # installs deps + downloads samples (~11MB)
# 2. Verify
npm test # 12-point smoke test
# 3. Render
node src/runtime/chunked-render.mjs assets/compositions/fog-and-starlight.js output/fog.wav 16
ffmpeg -i output/fog.wav -codec:a libmp3lame -b:a 192k output/fog.mp3
| Invocation | What it does |
|---|---|
/strudel <prompt> | Compose from natural language — mood, scene, genre, instruments |
/strudel play <name> | Stream a saved composition into Discord VC |
/strudel list | Show available compositions with metadata |
/strudel samples | Manage sample packs (list, download, add) |
/strudel concert <tracks...> | Play a setlist in Discord VC |
references/mood-parameters.md).js composition using Strudel pattern syntaxnode src/runtime/chunked-render.mjs <file> <output.wav> <cycles> [chunkSize]
ffmpeg -i output.wav -codec:a libmp3lame -b:a 192k output.mp3
node src/runtime/offline-render-v2.mjs assets/compositions/combat-assault.js /tmp/track.wav 12 140
ffmpeg -i /tmp/track.wav -ar 48000 -ac 2 /tmp/track-48k.wav -y
node scripts/vc-play.mjs /tmp/track-48k.wav
WSL2 users: enable mirrored networking (networkingMode=mirrored in .wslconfig) or VC streaming will fail silently (NAT breaks Discord's UDP voice protocol).
Samples live in samples/. Any directory of WAV files is auto-discovered.
samples/
├── strudel.json ← sample map (pitch info, paths)
├── kick/
│ └── kick.wav
├── hat/
│ └── hat.wav
├── bass_Cs1/
│ └── bass_Cs1.wav ← pitched sample (root: C#1)
├── synth_lead/
│ └── synth_lead.wav ← pitched sample (root: C#3, declared in strudel.json)
└── bloom_kick/
└── bloom_kick.wav ← from audio deconstruction
Maps sample names to files with optional root note declarations. The renderer uses this as the authoritative source for pitch detection.
{
"_base": "./",
"kick": { "0": "kick/kick.wav" },
"bass_Cs1": { "cs1": "bass_Cs1/bass_Cs1.wav" },
"synth_lead": { "cs3": "synth_lead/synth_lead.wav" }
}
_Cs1, _D2) declare the root pitch"0" as the keybash scripts/samples-manage.sh list # show installed packs
bash scripts/samples-manage.sh add <url> # download from URL
bash scripts/samples-manage.sh add ~/my-samples/ # add local directory
Ships with dirt-samples (153 WAVs, CC-licensed). Security: downloads enforce size limits (STRUDEL_MAX_DOWNLOAD_MB, default 10GB), MIME validation, optional host allowlist (STRUDEL_ALLOWED_HOSTS).
CC0 / Free packs (just download and drop in samples/):
Your own packs: Export from any DAW (Ableton, FL Studio, M8 tracker, etc.) as WAV directories. Strudel doesn't care where they came from — it's just WAV files in folders.
Named banks (Strudel built-in, requires CDN access):
sound("bd sd cp hh").bank("RolandTR909")
sound("bd sd hh oh").bank("LinnDrum")
If running on WSL2 and streaming to Discord VC, enable mirrored networking:
# %USERPROFILE%\.wslconfig
[wsl2]
networkingMode=mirrored
Then wsl --shutdown and relaunch. Without this, WSL2's NAT breaks Discord's UDP voice protocol — the bot joins the channel but no audio flows because IP discovery packets can't traverse the NAT return path. Mirrored mode eliminates the NAT by putting WSL2 directly on the host's network stack.
This only affects VC streaming. Offline rendering and file posting work in any networking mode.
Two tiers, depending on what you need:
OfflineAudioContext)Everything above, plus:
demucs, librosa, numpy, scipy, scikit-learn, torchInstall the Python deps:
pip install demucs librosa numpy scipy scikit-learn torch
If Python deps are missing, composition and rendering still work — you just can't do stem extraction. The skill should fail gracefully with a message, not a stack trace.
If you have an MP3 and want to extract instruments from it, build sample racks, and compose with the extracted material — that's the full pipeline. It goes:
MP3 → Demucs (stem separation) → librosa (analysis) → sample slicing → Strudel composition → render → MP3
This is a 4–8 minute process for a typical track. See docs/pipeline.md for the complete stage-by-stage breakdown with commands, timings, and resource requirements.
# 1. Separate stems (Python/Demucs)
python -m demucs input.mp3 --out ./stems
# 2. Analyze + slice (see docs/pipeline.md for details)
# Currently semi-manual — analysis scripts in development
# 3. Write composition referencing sliced samples
# 4. Render
bash scripts/dispatch.sh render my-composition.js 16 120
# 5. Convert
ffmpeg -i output.wav -c:a libmp3lame -q:a 2 output.mp3 -y
| Stage | CPU estimate | GPU estimate |
|---|---|---|
| Demucs stem separation | ~15s/min of audio | ~3s/min of audio |
| Audio analysis (per stem) | ~10–20s | ~10–20s |
| Sample slicing | ~5s | ~5s |
| Composition | instant (human/AI writes JS) | instant |
| Rendering | ~30–60s/min of output | ~30–60s/min of output |
| MP3 conversion | ~5s | ~5s |
Total (4-min track, CPU): 4–8 minutes. Compose + render only (no Demucs): 2–3 minutes.
The full pipeline takes 4–8 minutes. Composition + render alone takes 2–3 minutes.
DO NOT run this inline in a Discord channel interaction or primary OpenClaw session. The 30-second response timeout will kill the process mid-render. There is no supervisor to recover. The skill will appear broken — silence, no output, no error message.
From an OpenClaw agent (correct):
sessions_spawn({
task: "Render strudel composition: /strudel dark ambient tension, 65bpm",
mode: "run",
runTimeoutSeconds: 600 // 10 minutes — generous for full pipeline
})
Background process (also correct):
exec({ command: "bash scripts/dispatch.sh render ...", background: true })
Direct CLI (fine for testing):
bash scripts/dispatch.sh render assets/compositions/fog-and-starlight.js 16 72
What to tell the user: "Rendering takes a few minutes — I'll post the audio when it's ready." Don't leave them hanging with no feedback.
// WRONG — will timeout after 30s in Discord context
exec({ command: "bash scripts/dispatch.sh render ..." })
// WRONG — blocking the main session for minutes
// (anything inline that takes >30s)
Detailed documentation lives in docs/:
| Document | What it covers |
|---|---|
docs/pipeline.md | Full pipeline stages, commands, timings, resource requirements, system dependencies |
docs/composition-guide.md | Practical composition lessons — mini-notation pitfalls, the space-vs-angle-bracket rule, .slow() interactions, debugging hap explosions |
docs/TESTING.md | Testing strategy — smoke tests, cross-platform validation, quality gates, naive install testing |
Start with composition-guide.md if you're writing patterns. The space-separated vs angle-bracket distinction is the #1 source of bugs (gain explosions, distortion, memory crashes). The guide covers it with real case studies.
The offline renderer uses node-web-audio-api (Rust-based Web Audio for Node.js) for real audio synthesis:
@strudel/core + @strudel/mini + @strudel/tonal parse pattern code into timed "haps"OfflineAudioContext.startRendering() produces complete audioNote on mini notation: The renderer explicitly calls setStringParser(mini.mini) after import because Strudel's npm dist bundles duplicate the Pattern class across modules. Same class of bug as openclaw#22790.
setcpm(120/4) // 120 BPM
stack(
s("bd sd [bd bd] sd").gain(0.4), // drums (samples)
s("[hh hh] [hh oh]").gain(0.2), // hats
note("c3 eb3 g3 c4") // melody
.s("sawtooth")
.lpf(sine.range(400, 2000).slow(8)) // filter sweep
.attack(0.01).decay(0.3).sustain(0.2) // ADSR envelope
.room(0.4).delay(0.2) // space
.gain(0.3)
)
| Syntax | Meaning |
|---|---|
"a b c d" | Sequence (one per beat) |
"[a b]" | Subdivide (two in one beat) |
"<a b c>" | Alternate per cycle (slowcat) |
"a*3" | Repeat |
"~" | Rest / silence |
.slow(2) / .fast(2) | Time stretch |
.euclid(3,8) | Euclidean rhythm |
| Mood | Tempo | Key/Scale | Character |
|---|---|---|---|
| tension | 60-80 | minor/phrygian | Low cutoff, sparse, drones |
| combat | 120-160 | minor | Heavy drums, fast, distorted |
| peace | 60-80 | pentatonic/major | Warm, slow, ambient |
| mystery | 70-90 | whole tone | Reverb, sparse |
| victory | 110-130 | major | Bright, fanfare |
| ritual | 45-60 | dorian | Organ drones, chant |
Full tree: references/mood-parameters.md. Production techniques: references/production-techniques.md.
Use <> (slowcat) for sequential values, NOT spaces:
// ❌ WRONG — all values play simultaneously, causes clipping
s("kick").gain("0.3 0.3 0.5 0.3")
// ✅ RIGHT — one value per cycle
s("kick").gain("<0.3 0.3 0.5 0.3>")
Full list: docs/KNOWN-PITFALLS.md
Always check after rendering:
ffmpeg -i output.wav -af loudnorm=print_format=json -f null - 2>&1 | grep -E "input_i|input_tp"
Target: -16 to -10 LUFS, true peak below -1 dBTP. Above -5 LUFS = something is wrong.
Full pipeline docs: references/integration-pipeline.md
Audio → Demucs (stems) → librosa (analysis) → strudel.json → Composition → Render
strudel.json with root notesRequires Python stack: uv init && uv add demucs librosa scikit-learn soundfile
src/runtime/
chunked-render.mjs — Chunked offline renderer (avoids OOM on long pieces)
offline-render-v2.mjs — Core offline renderer
smoke-test.mjs — 12-point smoke test
scripts/
download-samples.sh — Download dirt-samples (idempotent)
samples-manage.sh — Sample pack manager
vc-play.mjs — Stream audio to Discord VC
samples/ — Sample packs + strudel.json (gitignored)
assets/compositions/ — 15 original compositions
src/compositions/ — Audio deconstructions
references/ — Mood trees, techniques, architecture
docs/
KNOWN-PITFALLS.md — Critical composition pitfalls
ONBOARDING.md — Machine-actor onboarding guide
Uses node-web-audio-api (Rust-based Web Audio for Node.js). No browser, no Puppeteer.
The renderer calls setStringParser(mini.mini) after import because Strudel's npm dist bundles duplicate the Pattern class across modules — the mini notation parser registers on a different copy than the one used by note() and s().
All synthesis is local and offline via OfflineAudioContext: oscillators, biquad filters, ADSR envelopes, AudioBufferSourceNode for samples, dynamics compression, stereo panning. Output: 16-bit stereo WAV at 44.1kHz.
| Platform | Issue | Workaround |
|---|---|---|
| ARM64 (all) | PyTorch CPU-only, no CUDA | Expected — Demucs runs ~0.25× realtime |
| ARM64 (all) | torchaudio.save() fails | Patch demucs/audio.py to use soundfile.write() (see First-Time Setup) |
| ARM64 (all) | torchcodec build fails | Not needed — skip it, Demucs works without it |
| WSL2 | Discord VC silent (NAT blocks UDP) | Enable mirrored networking in .wslconfig |
| All | Strudel mini parser not registered | Renderer calls setStringParser(mini.mini) — already handled |
Strudel compositions are JavaScript files executed by Node.js. They have the same access as any Node.js script:
For untrusted compositions:
For your own compositions: No special precautions needed — you wrote the code.
This is the same trust model as any programming language skill. The renderer itself is safe; the risk is in what compositions you choose to run.
This skill uses OpenClaw's built-in Discord voice channel support for streaming. No separate BOT_TOKEN, DISCORD_TOKEN, or any Discord credentials are required. OpenClaw handles all Discord authentication and connection management. The skill simply produces audio files and hands them to OpenClaw's voice subsystem.
package.json contains no postinstall, preinstall, or lifecycle hooks. npm run setup runs npm install + scripts/download-samples.sh (downloads CC0 sample packs from known URLs).
scripts/download-samples.sh fetchesThe download script sparse-clones tidalcycles/Dirt-Samples from GitHub (CC-licensed) — specifically these directories: bd sd hh oh cp cr ride rim mt lt ht cb 808bd 808sd 808hc 808oh. This fetches ~153 WAV files (~11MB total). The script is idempotent (skips if samples already exist).
scripts/samples-manage.sh doesThe sample manager downloads additional packs from user-specified URLs with safety controls:
STRUDEL_MAX_DOWNLOAD_MB (default: 10GB)STRUDEL_ALLOWED_HOSTS (comma-separated; empty = allow all)Only one render should be active per session at a time. If a user requests /strudel clone while a previous render is in progress:
subagents(action=list)Why: Concurrent renders with default output paths both write to output.wav, causing the second to overwrite the first. Even with explicit paths, two simultaneous OfflineAudioContext processes double memory usage. Sample loading is per-process (no shared cache), so there's no corruption risk — but disk I/O contention on the output write is real.