Generate music tracks using CLI audio tools (ffmpeg + SoX). Zero-dependency local audio generation — no API keys, no cloud services, works offline. Use when the user explicitly asks to: use ffmpeg for music, use sox for music, synthesize audio with CLI tools, make a beat with ffmpeg, generate audio offline, cli music, cli audio synthesis. This skill produces lo-fi synthesized electronic music only — for professional quality tracks with vocals, use replicate-music, suno-music, or elevenlabs-music instead.
Generate music tracks locally using CLI audio tools — no API keys, no cloud services, no external dependencies.
This skill produces lo-fi synthesized electronic music. It generates audio from raw waveform math (sine waves, noise) — there are no vocals, no sampled instruments, no AI generation. Output quality is comparable to chiptune or basic synthesizer demos. For professional-quality tracks with real vocals and full production, use
replicate-music($0.03/song, no minimum) orsuno-music(30 free credits).
Related skills:
Generate a working lo-fi track in one shot. Estimated generation time: 10-30 seconds.
TMPDIR=$(mktemp -d)
BPM=85; BEAT=0.706; BAR=$(echo "$BEAT * 4" | bc); DUR=60
# Pad — detuned chord
sox -n -r 44100 "$TMPDIR/pad.wav" synth $DUR sawtooth 220 sawtooth 220.8 sawtooth 261.63 sawtooth 262.4 sawtooth 329.63 sawtooth 330.4 chorus 0.6 0.9 40 0.4 0.25 2 lowpass 700 reverb 50 gain -12
# Bass — root note
sox -n -r 44100 "$TMPDIR/bass.wav" synth $DUR sawtooth 110 lowpass 350 gain -8
# Kick
sox -n -r 44100 "$TMPDIR/kick.wav" synth 0.3 sine 160:45 fade t 0.005 0.3 0.2 gain -3
# Texture — pink noise bed
ffmpeg -y -f lavfi -i "anoisesrc=color=pink:d=$DUR:s=44100" -af "lowpass=f=3000,highpass=f=200,volume=0.04" "$TMPDIR/texture.wav"
# Loop kick to fill duration
sox "$TMPDIR/kick.wav" "$TMPDIR/kick_bar.wav" repeat 3
ffmpeg -y -stream_loop $(echo "$DUR / ($BAR)" | bc) -i "$TMPDIR/kick_bar.wav" -t $DUR -c copy "$TMPDIR/kick_full.wav"
# Mix and export
ffmpeg -y -i "$TMPDIR/pad.wav" -i "$TMPDIR/bass.wav" -i "$TMPDIR/kick_full.wav" -i "$TMPDIR/texture.wav" \
-filter_complex "[0][1][2][3]amix=inputs=4:normalize=0,loudnorm=I=-16:TP=-1.5:LRA=11,afade=t=in:d=3,afade=t=out:st=57:d=3" \
-ar 44100 -ac 2 -codec:a libmp3lame -b:a 192k track.mp3
rm -rf "$TMPDIR"
ffprobe -v error -show_entries format=duration,bit_rate -show_entries stream=sample_rate,channels,codec_name -of default=noprint_wrappers=1 track.mp3
For better results, read the full guide below to add melody, drum patterns, and arrangement.
After generating your track, use the claw-fm skill for cover art, metadata, submission, and profile setup.
Output target: MP3, 192 kbps, 44100 Hz, stereo, 1–3 minutes.
Approach: Generate individual WAV layers (melody, bass, drums, pad), mix them together, then export as MP3.
Tools:
| Tool | Install | Use for |
|---|---|---|
| ffmpeg | Usually pre-installed | Mixing, filter graphs, MP3 encoding, aevalsrc expression synthesis |
| SoX | brew install sox / apt install sox | Waveform synthesis (native sawtooth/square/triangle/pluck), effects, sequencing |
| ffprobe | Bundled with ffmpeg | Verifying output (duration, bitrate, format) |
Prefer SoX for tone generation when available — native waveforms eliminate manual harmonic stacking. Fall back to ffmpeg aevalsrc if SoX isn't installed. Always use ffmpeg for final mix and MP3 export.
Temp directory workflow:
TMPDIR=$(mktemp -d)
# Generate layers into $TMPDIR/
# Mix layers
# Export final MP3
# Clean up: rm -rf "$TMPDIR"
All ffmpeg commands use -y to overwrite without prompting.
BPM to beat duration:
beat_duration = 60 / BPM
Note values at common BPMs:
| BPM | Whole (s) | Half (s) | Quarter (s) | Eighth (s) | Sixteenth (s) |
|---|---|---|---|---|---|
| 80 | 3.000 | 1.500 | 0.750 | 0.375 | 0.1875 |
| 90 | 2.667 | 1.333 | 0.667 | 0.333 | 0.167 |
| 100 | 2.400 | 1.200 | 0.600 | 0.300 | 0.150 |
| 110 | 2.182 | 1.091 | 0.545 | 0.273 | 0.136 |
| 120 | 2.000 | 1.000 | 0.500 | 0.250 | 0.125 |
| 130 | 1.846 | 0.923 | 0.462 | 0.231 | 0.115 |
| 140 | 1.714 | 0.857 | 0.429 | 0.214 | 0.107 |
Measure duration (4/4 time): 4 * beat_duration
Loop math: To fill N seconds, loop a bar ceil(N / measure_duration) times.
| Note | Oct 2 | Oct 3 | Oct 4 | Oct 5 | Oct 6 |
|---|---|---|---|---|---|
| C | 65.41 | 130.81 | 261.63 | 523.25 | 1046.50 |
| C# | 69.30 | 138.59 | 277.18 | 554.37 | 1108.73 |
| D | 73.42 | 146.83 | 293.66 | 587.33 | 1174.66 |
| D# | 77.78 | 155.56 | 311.13 | 622.25 | 1244.51 |
| E | 82.41 | 164.81 | 329.63 | 659.26 | 1318.51 |
| F | 87.31 | 174.61 | 349.23 | 698.46 | 1396.91 |
| F# | 92.50 | 185.00 | 369.99 | 739.99 | 1479.98 |
| G | 98.00 | 196.00 | 392.00 | 783.99 | 1567.98 |
| G# | 103.83 | 207.65 | 415.30 | 830.61 | 1661.22 |
| A | 110.00 | 220.00 | 440.00 | 880.00 | 1760.00 |
| A# | 116.54 | 233.08 | 466.16 | 932.33 | 1864.66 |
| B | 123.47 | 246.94 | 493.88 | 987.77 | 1975.53 |
C Major: 261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88 A Minor (natural): 440.00, 493.88, 523.25, 587.33, 659.26, 698.46, 783.99 C Minor: 261.63, 293.66, 311.13, 349.23, 392.00, 415.30, 466.16 A Minor Pentatonic: 440.00, 523.25, 587.33, 659.26, 783.99 C Major Pentatonic: 261.63, 329.63, 392.00, 440.00, 523.25
| Type | Intervals | Example (C4) |
|---|---|---|
| Major triad | root, M3, P5 | 261.63, 329.63, 392.00 |
| Minor triad | root, m3, P5 | 261.63, 311.13, 392.00 |
| Major 7th | root, M3, P5, M7 | 261.63, 329.63, 392.00, 493.88 |
| Minor 7th | root, m3, P5, m7 | 261.63, 311.13, 392.00, 466.16 |
| Dominant 7th | root, M3, P5, m7 | 261.63, 329.63, 392.00, 466.16 |
I–V–vi–IV in C Major:
ii–V–I in C Major:
i–iv–v in A Minor:
Practical rules that directly affect output quality:
CRITICAL: Never use bare
sin()tones. A single sine wave sounds like a test tone, not music. Always use harmonic stacking (sawtooth or square approximations below) for any melodic or bass content. The waveform recipe is what makes ffmpeg synthesis sound like actual instruments vs. a hearing test.
Sawtooth approximation (warm, buzzy — bass, pads, leads):
saw(f,t) = sin(f*2*PI*t) - sin(2*f*2*PI*t)/2 + sin(3*f*2*PI*t)/3 - sin(4*f*2*PI*t)/4 + sin(5*f*2*PI*t)/5
More harmonics = brighter/buzzier. 5–6 harmonics is a good default. Divide by ~3 to normalize amplitude.
Square approximation (hollow, reedy — melody, leads, chiptune):
sq(f,t) = sin(f*2*PI*t) + sin(3*f*2*PI*t)/3 + sin(5*f*2*PI*t)/5 + sin(7*f*2*PI*t)/7
Only odd harmonics. 4 terms is a good default. Divide by ~2 to normalize.
Vibrato via frequency modulation (adds life to sustained notes):
sin(f * (1 + 0.004*sin(5*2*PI*t)) * 2*PI*t)
Depth 0.003–0.005 is subtle, 5–6 Hz rate is natural. Apply to the fundamental frequency.
Detuned pair (width, movement — pads, ambient):
(sin(f*2*PI*t) + sin((f+0.8)*2*PI*t)) / 2
Offset 0.5–1.5 Hz creates slow beating. Apply to each note in a chord for lush pads.
aevalsrc — Expression-based synthesis (primary tool)The aevalsrc filter evaluates a math expression over time variable t (seconds) and sample number n. Use it for melodies, bass lines, and complex tones.
Available functions: sin(), cos(), exp(), log(), abs(), mod(a,b), between(x,lo,hi) (returns 1 if lo<=x<=hi), gt(a,b), lt(a,b), gte(), lte(), if(cond,then,else), min(), max(), pow(), sqrt(), PI, E
Sawtooth bass note (use this as the baseline, not bare sine):
# A2 sawtooth bass with 5 harmonics
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(110*2*PI*t)-sin(220*2*PI*t)/2+sin(330*2*PI*t)/3-sin(440*2*PI*t)/4+sin(550*2*PI*t)/5)/3 \
':s=44100:d=2" out.wav
Square-wave melody note with vibrato:
# A4 square-ish lead with vibrato
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(440*(1+0.004*sin(5*2*PI*t))*2*PI*t)+sin(1320*2*PI*t)/3+sin(2200*2*PI*t)/5+sin(3080*2*PI*t)/7)/2 \
':s=44100:d=2" out.wav
Tone with amplitude envelope (attack + decay):
# Fast attack (0.01s), exponential decay
ffmpeg -y -f lavfi -i "aevalsrc='sin(440*2*PI*t) * min(t/0.01, 1) * exp(-3*t)':s=44100:d=2" out.wav
Chord (sum of detuned pairs for width):
# C major chord with detuned pairs — sounds full, not thin
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(261.63*2*PI*t)+sin(262.4*2*PI*t) + sin(329.63*2*PI*t)+sin(330.4*2*PI*t) + sin(392*2*PI*t)+sin(392.8*2*PI*t))/6 \
':s=44100:d=2" out.wav
Sequencing notes with between(mod(t,...), ...):
# Play 4 notes in sequence, each 0.5s, looping every 2s
# C4 -> E4 -> G4 -> C5
ffmpeg -y -f lavfi -i "aevalsrc='\
sin(261.63*2*PI*t)*between(mod(t,2),0,0.5) + \
sin(329.63*2*PI*t)*between(mod(t,2),0.5,1.0) + \
sin(392.00*2*PI*t)*between(mod(t,2),1.0,1.5) + \
sin(523.25*2*PI*t)*between(mod(t,2),1.5,2.0) \
':s=44100:d=8" out.wav
Note with per-note envelope (avoid clicks):
# Each note gets its own attack/release envelope
# mt = position within the current 0.5s slot
ffmpeg -y -f lavfi -i "aevalsrc='\
sin(261.63*2*PI*t) * between(mod(t,2),0,0.5) * min(mod(t,0.5)/0.01,1) * min((0.5-mod(t,0.5))/0.01,1) + \
sin(329.63*2*PI*t) * between(mod(t,2),0.5,1.0) * min((mod(t,2)-0.5)/0.01,1) * min((1.0-mod(t,2))/0.01,1) + \
sin(392.00*2*PI*t) * between(mod(t,2),1.0,1.5) * min((mod(t,2)-1.0)/0.01,1) * min((1.5-mod(t,2))/0.01,1) + \
sin(523.25*2*PI*t) * between(mod(t,2),1.5,2.0) * min((mod(t,2)-1.5)/0.01,1) * min((2.0-mod(t,2))/0.01,1) \
':s=44100:d=8" out.wav
anoisesrc — Noise-based percussion# White noise (hihats)
ffmpeg -y -f lavfi -i "anoisesrc=color=white:d=0.1:s=44100" hihat.wav
# Pink noise (snare body)
ffmpeg -y -f lavfi -i "anoisesrc=color=pink:d=0.2:s=44100" snare_body.wav
Single-source drum hits sound thin and fake. Always layer multiple components:
Kick — sine sweep body + sub layer:
ffmpeg -y -f lavfi -i "aevalsrc='\
0.9*sin(2*PI*(160*exp(-8*t)+45)*t)*exp(-4*t) \
+ 0.4*sin(2*PI*(80*exp(-15*t)+30)*t)*exp(-3*t) \
':s=44100:d=0.4" -af "lowpass=f=200,volume=1.5" kick.wav
Snare — pitched tone body + noise crack (generate separately, then mix):
# Tone body (~200Hz, fast decay)
ffmpeg -y -f lavfi -i "aevalsrc='0.6*sin(200*2*PI*t)*exp(-15*t)':s=44100:d=0.25" snare_tone.wav
# Noise crack (pink noise, bandpassed, shaped)
ffmpeg -y -f lavfi -i "anoisesrc=color=pink:d=0.2:s=44100" \
-af "bandpass=f=1200:width_type=h:w=2000,afade=t=in:d=0.001,afade=t=out:st=0.05:d=0.15,volume=0.8" snare_noise.wav
# Layer them
ffmpeg -y -i snare_tone.wav -i snare_noise.wav -filter_complex "[0][1]amix=inputs=2:normalize=0" snare.wav
Closed hihat — white noise, tight envelope, high-passed:
ffmpeg -y -f lavfi -i "anoisesrc=color=white:d=0.08:s=44100" \
-af "highpass=f=7000,bandpass=f=10000:width_type=h:w=4000,afade=t=out:st=0.015:d=0.065,volume=0.35" hihat.wav
Open hihat — longer decay for accent/variation:
ffmpeg -y -f lavfi -i "anoisesrc=color=white:d=0.2:s=44100" \
-af "highpass=f=6000,bandpass=f=9000:width_type=h:w=5000,afade=t=out:st=0.05:d=0.15,volume=0.25" openhat.wav
SoX has native waveform types — no harmonic stacking needed. Prefer SoX for tone generation when it's available.
Check: which sox | Install: brew install sox (macOS) or apt install sox (Linux)
sine, square, sawtooth, triangle, trapezium, pluck, whitenoise, pinknoise, brownnoise
# Sawtooth A4 for 2 seconds — one command vs 5-line aevalsrc expression
sox -n -r 44100 out.wav synth 2 sawtooth 440
# C major chord — three sawtooth oscillators summed
sox -n -r 44100 out.wav synth 2 sawtooth 261.63 sawtooth 329.63 sawtooth 392
# 4-note melody: C4 → E4 → G4 → C5, each 0.5s
sox -n -r 44100 out.wav synth 0.5 sawtooth 261.63 : synth 0.5 sawtooth 329.63 : synth 0.5 sawtooth 392 : synth 0.5 sawtooth 523.25
# Sine sweep 160→45Hz with fade — replaces the complex aevalsrc kick recipe
sox -n -r 44100 kick.wav synth 0.3 sine 160:45 fade t 0.005 0.3 0.2 gain -3
# Karplus-Strong pluck — great for lo-fi guitar sounds, not possible with ffmpeg
sox -n -r 44100 pluck.wav synth 1.5 pluck 220 fade t 0.005 1.5 0.5
# Effects chain left-to-right after filename
sox in.wav out.wav reverb 50 chorus 0.6 0.9 40 0.4 0.25 2 lowpass 800 gain -2
| Effect | Usage | Example |
|---|---|---|
reverb | Room ambience (30–80 useful range) | reverb 50 |
chorus | Stereo widening | chorus 0.6 0.9 40 0.4 0.25 2 |
overdrive | Harmonic saturation (5–10 subtle, 20+ aggressive) | overdrive 10 |
lowpass | Warmth | lowpass 800 |
highpass | Clarity for non-bass | highpass 200 |
tremolo | Amplitude modulation | tremolo 5 60 |
fade | Fade in/out | fade t 2 60 3 |
gain | Volume in dB | gain -3 |
# Mix files (sum)
sox -m melody.wav bass.wav mix.wav
# Concatenate files
sox bar.wav bar.wav bar.wav long.wav
# Repeat a bar 30 times (29 additional plays)
sox in.wav out.wav repeat 29
# Trim to exact duration
sox in.wav out.wav trim 0 2.824
| Task | SoX | ffmpeg |
|---|---|---|
| Sawtooth/square/triangle tone | synth 2 sawtooth 440 | 5-line aevalsrc harmonic series |
| Kick drum (freq sweep) | synth 0.3 sine 160:45 | aevalsrc with exp() sweep |
| Plucked string sound | synth 1.5 pluck 220 | Not practical |
| Reverb | reverb 50 — simple, good | aecho — workable but not true reverb |
| Complex time-sequenced patterns | Clunky : concat syntax | between(mod(t,...)) — flexible |
| Place hits at exact ms positions | Not supported | adelay + amix |
| Mix 5+ layers with per-layer volume | Possible but awkward | amix=normalize=0 + volume per input |
| Final MP3 encode + loudnorm | No MP3 encoding | libmp3lame + loudnorm |
Recommended workflow: Use SoX to generate individual tones, chords, and drum hits (simpler, better waveforms). Use ffmpeg for assembling patterns (
adelay+amix), mixing layers, and MP3 export.
adelay + amixPlace individual hits at precise times using adelay (in milliseconds):
BPM=120
BEAT_MS=500 # 60000/120
# Kick on beats 1 and 3 (0ms and 1000ms)
# Snare on beats 2 and 4 (500ms and 1500ms)
# Hihat on every eighth note (0, 250, 500, 750, 1000, 1250, 1500, 1750)
ffmpeg -y \
-f lavfi -i "aevalsrc='sin(2*PI*(150*exp(-10*t))*t)*exp(-5*t)':s=44100:d=0.3" \
-f lavfi -i "anoisesrc=color=pink:d=0.15:s=44100" \
-f lavfi -i "anoisesrc=color=white:d=0.05:s=44100" \
-filter_complex "\
[0]asplit=2[k1][k2]; \
[1]asplit=2[s1][s2]; \
[2]asplit=8[h1][h2][h3][h4][h5][h6][h7][h8]; \
[k1]adelay=0|0[dk1]; [k2]adelay=1000|1000[dk2]; \
[s1]adelay=500|500,highpass=f=200,bandpass=f=300:width_type=h:w=200[ds1]; \
[s2]adelay=1500|1500,highpass=f=200,bandpass=f=300:width_type=h:w=200[ds2]; \
[h1]adelay=0|0,highpass=f=8000[dh1]; \
[h2]adelay=250|250,highpass=f=8000[dh2]; \
[h3]adelay=500|500,highpass=f=8000[dh3]; \
[h4]adelay=750|750,highpass=f=8000[dh4]; \
[h5]adelay=1000|1000,highpass=f=8000[dh5]; \
[h6]adelay=1250|1250,highpass=f=8000[dh6]; \
[h7]adelay=1500|1500,highpass=f=8000[dh7]; \
[h8]adelay=1750|1750,highpass=f=8000[dh8]; \
[dk1][dk2][ds1][ds2][dh1][dh2][dh3][dh4][dh5][dh6][dh7][dh8]amix=inputs=12:normalize=0,volume=0.8,apad=whole_dur=2 \
" -t 2 drum_bar.wav
IMPORTANT:
amixoutput duration = longest input stream. Withadelay, the last hit's delayed sample + its decay is shorter than a full bar, so the bar gets truncated. Always appendapad=whole_dur=BAR_DURATIONafteramixto pad to the exact bar length.
-stream_loop# Loop a 2-second bar to fill 60 seconds
ffmpeg -y -stream_loop 29 -i drum_bar.wav -t 60 -c copy drums_full.wav
aevalsrcUse the between(mod(t, bar_duration), start, end) pattern to sequence notes within a looping bar. Always add per-note envelopes (short fade in/out of ~10ms) to avoid clicks at note boundaries.
Use square-wave harmonics (odd harmonics) for a clear, musical tone. Add vibrato for life, tremolo for movement. Always include per-note envelopes.
# Square-ish melody with vibrato, 4 notes at 120 BPM
# Pattern: C5, D5, E5, C5 — each note has attack/release envelope
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(523.25*(1+0.004*sin(5*2*PI*t))*2*PI*t)+sin(1569.75*2*PI*t)/3+sin(2616.25*2*PI*t)/5)*0.3 \
* between(mod(t,2),0,0.5) * min(mod(t,2)/0.015,1) * min((0.5-mod(t,2))/0.04,1) + \
(sin(587.33*(1+0.004*sin(5*2*PI*t))*2*PI*t)+sin(1761.99*2*PI*t)/3+sin(2936.65*2*PI*t)/5)*0.3 \
* between(mod(t,2),0.5,1.0) * min((mod(t,2)-0.5)/0.015,1) * min((1.0-mod(t,2))/0.04,1) + \
(sin(659.26*(1+0.004*sin(5*2*PI*t))*2*PI*t)+sin(1977.78*2*PI*t)/3+sin(3296.3*2*PI*t)/5)*0.3 \
* between(mod(t,2),1.0,1.5) * min((mod(t,2)-1.0)/0.015,1) * min((1.5-mod(t,2))/0.04,1) + \
(sin(523.25*(1+0.004*sin(5*2*PI*t))*2*PI*t)+sin(1569.75*2*PI*t)/3+sin(2616.25*2*PI*t)/5)*0.3 \
* between(mod(t,2),1.5,2.0) * min((mod(t,2)-1.5)/0.015,1) * min((2.0-mod(t,2))/0.04,1) \
':s=44100:d=2" -af "lowpass=f=2500,tremolo=f=3:d=0.3,volume=0.55" melody_bar.wav
Use sawtooth harmonics (all harmonics, alternating sign) for a full, warm bass. Lowpass removes harshness while keeping body.
# Sawtooth bass following chord roots at 120 BPM, with plucky envelope
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(130.81*2*PI*t)-sin(261.62*2*PI*t)/2+sin(392.43*2*PI*t)/3-sin(523.24*2*PI*t)/4+sin(654.05*2*PI*t)/5)/3 \
* between(mod(t,2),0,0.45) * min(mod(t,0.5)/0.008,1) * exp(-1.2*mod(t,0.5)) + \
(sin(130.81*2*PI*t)-sin(261.62*2*PI*t)/2+sin(392.43*2*PI*t)/3-sin(523.24*2*PI*t)/4+sin(654.05*2*PI*t)/5)/3 \
* between(mod(t,2),0.5,0.95) * min((mod(t,2)-0.5)/0.008,1) * exp(-1.2*(mod(t,2)-0.5)) + \
(sin(146.83*2*PI*t)-sin(293.66*2*PI*t)/2+sin(440.49*2*PI*t)/3-sin(587.32*2*PI*t)/4)/3 \
* between(mod(t,2),1.0,1.45) * min((mod(t,2)-1.0)/0.008,1) * exp(-1.2*(mod(t,2)-1.0)) + \
(sin(130.81*2*PI*t)-sin(261.62*2*PI*t)/2+sin(392.43*2*PI*t)/3-sin(523.24*2*PI*t)/4+sin(654.05*2*PI*t)/5)/3 \
* between(mod(t,2),1.5,1.95) * min((mod(t,2)-1.5)/0.008,1) * exp(-1.2*(mod(t,2)-1.5)) \
':s=44100:d=2" -af "lowpass=f=350,volume=0.7" bass_bar.wav
See the "Drum sample recipes" section above for individual hit construction, then:
adelay (values in ms, both channels: adelay=500|500)amix=inputs=N:normalize=0,apad=whole_dur=BAR_DURATION — the apad is critical-stream_loop to fill target durationDetuned oscillator pairs with chorus and echo for width. The detune offset (0.5–1.5 Hz) creates slow beating that evolves over time. Effects are essential here — raw detuned sines still sound thin.
# Warm pad: detuned pairs + chorus + echo + lowpass
ffmpeg -y -f lavfi -i "aevalsrc='\
(sin(261.63*2*PI*t)+sin(262.4*2*PI*t) + sin(329.63*2*PI*t)+sin(330.4*2*PI*t) + sin(392*2*PI*t)+sin(392.8*2*PI*t)) / 6 \
* (0.6 + 0.4*sin(0.15*2*PI*t)) \
':s=44100:d=8" \
-af "lowpass=f=800,chorus=0.6:0.9:40|55|70:0.4|0.35|0.3:0.25|0.3|0.35:2|1.5|2.5,aecho=0.8:0.7:90:0.25,volume=0.3" pad.wav
A quiet noise bed fills the frequency spectrum and adds warmth. Without it, gaps between notes feel sterile.
# Lo-fi room noise: pink noise, bandpassed, very quiet
ffmpeg -y -f lavfi -i "anoisesrc=color=pink:d=60:s=44100" \
-af "lowpass=f=3000,highpass=f=200,volume=0.04" vinyl_texture.wav
Include this in every mix at low volume (0.03–0.05). It's subtle but fills the space between notes.
Effects are not optional. Raw synthesized waveforms sound sterile regardless of harmonic content. At minimum, apply
lowpass(warmth),chorus(width), andvolume(balance) to every layer. Pads needaechooraecho+chorusto sound like pads. Melodies needtremoloorvibratofor life.
Essential (use on every track):
| Filter | Usage | Example |
|---|---|---|
volume | Per-layer level balance | volume=0.6 |
lowpass | Cut highs (warmth, taming harmonics) | lowpass=f=800 |
highpass | Cut lows (clarity for non-bass) | highpass=f=200 |
afade | Fade in/out (prevent abrupt start/end) | afade=t=in:d=2, afade=t=out:st=58:d=2 |
amix | Mix multiple streams | amix=inputs=4:normalize=0 |
apad | Pad with silence (fix amix truncation) | apad=whole_dur=2.824 |
loudnorm | EBU R128 loudness normalization | loudnorm=I=-16:TP=-1.5:LRA=11 |
Strongly recommended (most layers benefit):
| Filter | Usage | Example |
|---|---|---|
chorus | Stereo width / thickening (essential for pads) | chorus=0.6:0.9:40|55|70:0.4|0.35|0.3:0.25|0.3|0.35:2|1.5|2.5 |
aecho | Space / depth (pads, ambient) | aecho=0.8:0.7:90|250:0.25|0.15 |
tremolo | Amplitude movement (melody, pads) | tremolo=f=3:d=0.3 |
bandpass | Isolate frequency band (drums, snare) | bandpass=f=1000:width_type=h:w=500 |
adelay | Place drum hits in time (ms, per-channel) | adelay=500|500 |
Situational:
| Filter | Usage | Example |
|---|---|---|
flanger | Sweeping modulation (leads, synths) | flanger=delay=3:depth=2:speed=0.5 |
vibrato | Pitch modulation (as filter, not in-expr) | vibrato=f=5:d=0.4 |
areverse | Reverse audio (FX, risers) | areverse |
atempo | Change speed | atempo=0.5 (half speed) |
Use amix with normalize=0 to prevent auto-gain (you control levels via per-layer volume).
Always include the vinyl/noise texture layer — it fills gaps between notes.
ffmpeg -y \
-i melody_full.wav \
-i bass_full.wav \
-i drums_full.wav \
-i pad_full.wav \
-i vinyl_texture.wav \
-filter_complex "\
[0][1][2][3][4]amix=inputs=5:normalize=0,\
loudnorm=I=-16:TP=-1.5:LRA=11\
" \
-ar 44100 -ac 2 mix.wav
Loudnorm targets by genre:
| Genre | Integrated (I) | True Peak (TP) | LRA |
|---|---|---|---|
| Ambient | -18 | -2.0 | 11 |
| Lo-fi / Chill | -16 | -1.5 | 11 |
| Electronic | -14 | -1.5 | 9 |
| Techno / EDM | -14 | -1.5 | 9 |
ffmpeg -y -i mix.wav -codec:a libmp3lame -b:a 192k -ar 44100 -ac 2 output.mp3
ffprobe -v error -show_entries format=duration,bit_rate,format_name -show_entries stream=sample_rate,channels,codec_name -of default=noprint_wrappers=1 output.mp3
Expected output should show: codec_name=mp3, sample_rate=44100, channels=2, bit_rate near 192000.
Build sections separately, then concatenate:
Typical structure:
# Create a file list
echo "file 'intro.wav'" > list.txt
echo "file 'verse.wav'" >> list.txt
echo "file 'chorus.wav'" >> list.txt
echo "file 'outro.wav'" >> list.txt
ffmpeg -y -f concat -safe 0 -i list.txt -c copy full_track.wav
# Fade in first 3 seconds, fade out last 3 seconds of a 60s track
ffmpeg -y -i full_track.wav -af "afade=t=in:d=3,afade=t=out:st=57:d=3" full_track_faded.wav
lowpass=f=350, plucky envelope with exp(-1.2*...), volume=0.7freq*(1+0.004*sin(5*2*PI*t))), lowpass=f=1800, tremolo=f=3:d=0.3, sparse (leave rests), volume=0.55lowpass=f=700, 3-voice chorus, aecho=0.8:0.7:90:0.25, slow LFO (0.6+0.4*sin(0.15*2*PI*t)), volume=0.35lowpass=f=3000,highpass=f=200, volume=0.04aecho=0.8:0.6:120|250:0.3|0.15, lowpass=f=900, very slow LFO (0.08Hz), volume=0.45. Each chord sustains a full bar with 0.2–0.3s crossfade envelopes.lowpass=f=2000, aecho=0.8:0.5:200|400:0.2|0.1, volume=0.15aecho for tail, volume=0.3lowpass=f=120, slow swell envelopes, volume=0.3volume=0.03.lowpass=f=250, volume=1.8. This is the anchor.bandpass=f=1500:width_type=h:w=2000, volume=0.6lowpass=f=400, volume=0.7bandpass=f=2000, flanger, volume=0.45lowpass=f=500, chorus, volume=0.15exp(-12*...)). Use between(mod(t,beat),...) to cycle 4 notes per beat. Build each bar separately (one per chord), then concat. Apply: bandpass=f=1500:width_type=h:w=2000, chorus, volume=0.4lowpass=f=300, volume=0.7lowpass=f=800, volume=0.25lowpass=f=3500, volume=0.35which sox — if available, prefer SoX synth for tones (native waveforms). ffmpeg is always required for mixing/export.TMPDIR=$(mktemp -d)adelay, mix with amix=normalize=0,apad=whole_dur=BAR. Build 2-bar patterns for variation.lowpass, plucky envelopechorus + aecho + lowpass — effects are mandatoryvolume=0.03–0.05-stream_loop to extend each layer to target durationadelay to stagger layer entries (pad first, then melody+bass, then drums). Use afade on individual layers for gradual intros. Don't start everything at once.afade=t=in:d=N at start, afade=t=out:st=X:d=N at endamix with normalize=0, then loudnorm with genre-appropriate targetlibmp3lame -b:a 192k -ar 44100 -ac 2ffprobe to check duration, bitrate, format, channelsrm -rf "$TMPDIR"Clipping / distortion:
(sin(f1*...) + sin(f2*...)) / 2volume values. Typical mix: melody=0.5, bass=0.7, drums=0.7, pad=0.3Silence in output:
aevalsrc expression for syntax errors — ffmpeg silently produces silence on invalid expressionsbetween() ranges don't have gapsClicks / pops at note boundaries:
min(elapsed/0.01, 1) * min(remaining/0.01, 1)Timing drift:
mod(t, bar_duration) matches the actual bar duration preciselyFile too large:
Expression syntax errors:
aevalsrc expressions (spaces can cause parsing issues)\ for line continuation in bash, but keep the expression itself on logical linesbetween() doesn't work, try gte(x,lo)*lt(x,hi) as equivalentamix changes volume unexpectedly:
normalize=0 to disable amix's automatic normalizationvolume filter on each inputamix truncates drum bars (bar too short):
amix output duration = longest input stream, not longest delayed positionapad=whole_dur=BAR_DURATION after amix in the filter chainamix=inputs=8:normalize=0,apad=whole_dur=2.824Track sounds like a test tone / bare sine wave:
sin(f*2*PI*t) — this always sounds clinicallowpass (warmth), chorus (width), tremolo/vibrato (movement)volume=0.04All layers start at once (wall of sound):
adelay on individual layers in the final mix to stagger entriesafade=t=in per-layer at the entry point for gradual introsMono output when stereo expected:
-ac 2 to the final export commandaevalsrc produces mono by default — this is fine for intermediates, -ac 2 on export converts to stereo