技能內容

Built on research from 30 Music cluster projects, 360 PNW musicians deep-dived (S36/SPS), Ableton Live research (ABL), Deep Audio (DAA), Dead Frequencies (DFQ), and High Fidelity amplifier analysis (HFR/HFE).

Expert-level audio engineering covering mastering, mixing, loudness standards, synthesis, podcast production, music theory, and spectrum analysis. Works alongside the ffmpeg-media skill for codec/format operations.

Loudness Standards

Target Levels by Platform

Platform	Target LUFS	True Peak	Standard
Spotify	-14 LUFS	-1 dBTP	AES streaming
Apple Music	-16 LUFS	-1 dBTP	Sound Check
YouTube	-14 LUFS	-1 dBTP	ITU-R BS.1770
Podcast (Apple)	-16 LUFS	-1 dBTP	Apple spec
Podcast (Spotify)

# Measure integrated loudness (LUFS) with ffmpeg
ffmpeg -i input.wav -af loudnorm=print_format=json -f null - 2>&1 | grep -A20 "Parsed_loudnorm"

# Full EBU R128 scan
ffmpeg -i input.wav -af ebur128=peak=true -f null - 2>&1 | tail -20

# Loudness normalization to -14 LUFS (two-pass for accuracy)
# Pass 1: measure
ffmpeg -i input.wav -af loudnorm=I=-14:LRA=11:TP=-1:print_format=json -f null - 2>&1 > /tmp/loudnorm.json
# Pass 2: apply (use measured values from pass 1)
ffmpeg -i input.wav -af loudnorm=I=-14:LRA=11:TP=-1:measured_I=-18.5:measured_LRA=9.2:measured_TP=-0.5:measured_thresh=-28.3 output.wav

# Normalize peak to -1 dBFS
sox input.wav output.wav gain -n -1

# Compressor (threshold -20dB, ratio 4:1, attack 5ms, release 50ms)
sox input.wav output.wav compand 0.005,0.05 -20,-20,-10,-10,0,-6

# 3-band EQ (low shelf +3dB at 200Hz, mid cut -2dB at 2kHz, high shelf +1dB at 8kHz)
sox input.wav output.wav bass +3 200 equalizer 2000 1q -2 treble +1 8000

# Noise reduction (profile then reduce)
sox noisy.wav -n noiseprof /tmp/noise.prof
sox noisy.wav clean.wav noisered /tmp/noise.prof 0.21

# Generate tone (440Hz sine, 3 seconds)
sox -n -r 44100 -c 1 tone.wav synth 3 sine 440

# Spectrum analysis (generate spectrogram PNG)
sox input.wav -n spectrogram -o spectrum.png

Input → EQ (corrective) → Compression → EQ (tonal) → Stereo Width → Limiting → Dithering → Output

# Full mastering chain: EQ → compression → limiting → loudness normalization
ffmpeg -i mix.wav -af "\
  equalizer=f=80:t=h:w=100:g=2,\
  equalizer=f=3000:t=h:w=1000:g=-1.5,\
  equalizer=f=12000:t=h:w=2000:g=1,\
  acompressor=threshold=-18dB:ratio=3:attack=10:release=100:knee=6,\
  alimiter=limit=-1dBFS:level=false,\
  loudnorm=I=-14:LRA=11:TP=-1\
" -ar 44100 -sample_fmt s16 mastered.wav

# Dithering (16-bit with triangular dither for CD)
ffmpeg -i master_24bit.wav -af "dither=method=triangular" -sample_fmt s16 -ar 44100 cd_master.wav

Band	Range	Character	Common Uses
Sub-bass	20-60 Hz	Felt, not heard	Kick fundamental, sub bass
Bass	60-250 Hz	Warmth, body	Bass guitar, kick punch, vocal warmth
Low-mid	250-500 Hz	Muddiness zone	Cut here to clean up mixes
Mid	500-2000 Hz	Body, presence	Vocal clarity, guitar body
Upper-mid	2-4 kHz	Presence, bite	Vocal intelligibility, guitar attack
Presence	4-6 kHz	Definition, edge	Consonant clarity, string attack
Brilliance	6-12 kHz	Air, shimmer	Cymbals, vocal air, acoustic sparkle
Ultra-high	12-20 kHz	Air, sparkle	Subtle sheen (careful: sibilance)

Source	Threshold	Ratio	Attack	Release	Knee
Vocals	-18 to -12 dB	2:1 to 4:1	5-15 ms	40-80 ms	Soft
Drums (bus)	-15 to -10 dB	3:1 to 6:1	10-30 ms	50-100 ms	Hard
Bass	-15 to -8 dB	3:1 to 8:1	10-30 ms	100-200 ms	Hard
Acoustic guitar	-20 to -12 dB	2:1 to 4:1	10-25 ms	100-150 ms	Soft
Mix bus	-20 to -15 dB	1.5:1 to 2:1	10-30 ms	100-300 ms	Soft
Podcast	-20 to -15 dB	3:1 to 5:1	5-10 ms	50-100 ms	Soft

Type	How It Works	Character	Classic Synths
Subtractive	Oscillator → Filter → Amplifier	Warm, analog, rich	Minimoog, Prophet-5, Juno-106
FM	Operators modulating each other's frequency	Metallic, bell-like, bright	DX7, FM8
Wavetable	Morphing between stored waveforms	Evolving, complex, modern	PPG Wave, Serum, Vital
Granular	Tiny audio grains layered and scattered	Atmospheric, textural, ambient	Granulator, Pigments
Additive	Sum of individual sine wave partials	Precise, organ-like	Kawai K5, Razor
Physical modeling	Mathematical model of physical instrument	Realistic, expressive	Chromaphone, Pianoteq
Sample-based	Recorded audio, pitch-shifted and layered	Realistic, natural	Kontakt, Sampler

        C
    F       G
  Bb          D
    Eb      A
       Ab/E

Name	Numerals	Example in C	Use
Pop	I-V-vi-IV	C-G-Am-F	80% of pop music
Blues	I-IV-V	C-F-G	Blues, rock
Jazz ii-V-I	ii-V-I	Dm7-G7-Cmaj7	Jazz standard
Andalusian	i-VII-VI-V	Am-G-F-E	Flamenco, dramatic
Canon	I-V-vi-iii-IV-I-IV-V	C-G-Am-Em-F-C-F-G	Pachelbel, ballads
Minor blues	i-iv-V	Am-Dm-E	Minor blues

# Record from default mic (sox)
sox -d -r 44100 -c 1 -b 16 recording.wav

# Record with ffmpeg (specify ALSA device on Linux)
ffmpeg -f alsa -i default -ar 44100 -ac 1 recording.wav

# 1. Noise reduction
sox recording.wav -n trim 0 0.5 noiseprof /tmp/noise.prof
sox recording.wav clean.wav noisered /tmp/noise.prof 0.21

# 2. Normalize + compress + EQ for voice
ffmpeg -i clean.wav -af "\
  highpass=f=80,\
  lowpass=f=12000,\
  equalizer=f=3000:t=h:w=1000:g=2,\
  acompressor=threshold=-20dB:ratio=4:attack=5:release=50,\
  loudnorm=I=-16:TP=-1\
" -ar 44100 podcast_ready.wav

# 3. Export MP3 for distribution
ffmpeg -i podcast_ready.wav -c:a libmp3lame -b:a 128k \
  -metadata title="Episode Title" \
  -metadata artist="Show Name" \
  -metadata album="Podcast Name" \
  -metadata genre="Podcast" \
  episode.mp3

# 4. Generate waveform for show notes
ffmpeg -i episode.mp3 -filter_complex "showwavespic=s=1920x200:colors=0x1a1a2e" -frames:v 1 waveform.png

# Set all metadata
ffmpeg -i episode.mp3 -c copy \
  -metadata title="EP 42: The Memory Architecture" \
  -metadata artist="GSD Podcast" \
  -metadata album="Getting Shit Done" \
  -metadata track="42" \
  -metadata date="2026" \
  -metadata comment="LOD-tiered memory system deep dive" \
  tagged.mp3

# Install aubio for beat/pitch detection
# apt install aubio-tools

# BPM detection
aubiotempo input.wav

# Pitch/key detection
aubiopitch -i input.wav -p yinfft

# Onset detection (transient markers)
aubioonset input.wav

# Generate stats (includes RMS, peak, DC offset)
sox input.wav -n stats 2>&1

Format	Sample Rate	Bit Depth	Use
CD	44.1 kHz	16-bit	Consumer playback
DVD	48 kHz	24-bit	Video soundtrack
Hi-Res	96 kHz	24-bit	Audiophile streaming
Studio	96-192 kHz	32-bit float	Recording/mixing
Podcast	44.1 kHz	16-bit	Voice distribution
Phone/VoIP	8-16 kHz	16-bit	Voice calls

# Downsample from 96kHz/24-bit to 44.1kHz/16-bit with dither
sox input_96_24.wav -r 44100 -b 16 output_441_16.wav dither -s

# Same with ffmpeg
ffmpeg -i input_96_24.wav -ar 44100 -sample_fmt s16 -af "dither=method=triangular" output.wav

Audio Engineering | Skills Pool

Audio Engineering

Audio Engineering

Loudness Standards

Target Levels by Platform

Measurement Commands

With sox

Mastering Chain

Standard Mastering Signal Flow

With ffmpeg Filters

EQ Reference

Frequency Bands and Characteristics

Common Problem Frequencies

Compression Reference

Settings by Source

Compression Types

Synthesis Reference

Synthesis Types

ADSR Envelope Quick Reference

Music Theory Quick Reference

Circle of Fifths (Major Keys)

Common Chord Progressions

Scales

Podcast Production Workflow

Recording

Processing Chain

ID3 Tags

BPM and Key Detection

With ffmpeg/aubio

With sox

Sample Rate / Bit Depth Reference

Conversion

When This Skill Activates

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api

Audio Engineering

Audio Engineering

Loudness Standards

Target Levels by Platform

Measurement Commands

With sox

Mastering Chain

Standard Mastering Signal Flow

With ffmpeg Filters

EQ Reference

Frequency Bands and Characteristics

Common Problem Frequencies

Compression Reference

Settings by Source

Compression Types

Synthesis Reference

Synthesis Types

ADSR Envelope Quick Reference

Music Theory Quick Reference

Circle of Fifths (Major Keys)

Common Chord Progressions

Scales

Podcast Production Workflow

Recording

Processing Chain

ID3 Tags

BPM and Key Detection

With ffmpeg/aubio

With sox

Sample Rate / Bit Depth Reference

Conversion

Related Skills & Agents

When This Skill Activates

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api