Analyze audio files to extract synthesis parameters: fundamental frequency, pitch envelope, attack time, duration, harmonic content, filters, effects, and stereo positioning. Use when reverse-engineering sounds from sample libraries, comparing audio output against @web-kits/audio definitions, splitting audio sprites, or running FFT analysis on WAV/MP3 files.
Extracts synthesis parameters from audio files using FFT analysis, then maps the results to @web-kits/audio SoundDefinition properties. Use this to reverse-engineer sample libraries, validate @web-kits/audio output against reference sounds, or build definitions from existing audio.
Dependencies: Python 3, numpy, scipy, ffmpeg (CLI).
Acquire source → Extract individual sounds → Analyze → Map to SoundDefinition
If the source is an audio sprite (single file containing multiple sounds), download it first.
npm pack <package-name> --pack-destination /tmp
tar -xzf /tmp/<package-name>-*.tgz -C /tmp
Look for MP3/WAV sprite files and any JSON manifest that maps sound names to time offsets.
Split the sprite into per-sound WAV files using ffmpeg. If a manifest provides start times and durations:
ffmpeg -i sprite.mp3 -ss <start_seconds> -t <duration_seconds> -acodec pcm_s16le -ar 44100 output/<name>.wav
If no manifest exists, use silence detection:
ffmpeg -i sprite.mp3 -af silencedetect=noise=-40dB:d=0.05 -f null -
Then split at the detected boundaries.
Run frequency-domain analysis using Python with scipy:
import numpy as np
from scipy.io import wavfile
from scipy.fft import rfft, rfftfreq
sample_rate, data = wavfile.read("sound.wav")
if data.ndim > 1:
data = data[:, 0]
def analyze_slice(data, sample_rate, start_ms, window_ms=10):
start = int(sample_rate * start_ms / 1000)
end = start + int(sample_rate * window_ms / 1000)
segment = data[start:end].astype(float)
if len(segment) == 0:
return None
segment *= np.hanning(len(segment))
spectrum = np.abs(rfft(segment))
freqs = rfftfreq(len(segment), 1 / sample_rate)
peak_idx = np.argmax(spectrum[1:]) + 1
return freqs[peak_idx]
| Parameter | Method | Maps to |
|---|---|---|
| Fundamental frequency | Peak FFT bin at onset (5-10ms in) | source.frequency |
| Pitch envelope | Compare frequency at 0ms, 5ms, 10ms, 20ms, 50ms | source.frequency: { start, end } |
| Active duration | Time above noise floor (-40dB threshold) | envelope.decay / envelope.release |
| Attack time | Onset to peak amplitude | envelope.attack |
| Sustain level | Amplitude ratio: sustain region vs peak | envelope.sustain |
| Harmonic content | Ratio of harmonic peaks to fundamental | Infer source.type |
| Peak amplitude | Max absolute sample value, normalized 0-1 | gain |
| Stereo balance | L/R channel amplitude ratio | pan (-1 to 1) |
| Pattern | Source type |
|---|---|
| Fundamental only, harmonics < -40dB | sine |
| Odd harmonics rolling off as 1/n | triangle |
| All harmonics rolling off as 1/n | sawtooth |
| Odd harmonics at roughly equal amplitude | square |
| No clear harmonic structure, broadband energy | noise |
| Custom harmonic profile (none of the above) | wavetable |
def classify_waveform(spectrum, freqs, fundamental_freq):
harmonics = []
for n in range(2, 9):
target = fundamental_freq * n
idx = np.argmin(np.abs(freqs - target))
harmonics.append(spectrum[idx])
fund_amp = spectrum[np.argmin(np.abs(freqs - fundamental_freq))]
ratios = [h / fund_amp for h in harmonics]
if all(r < 0.01 for r in ratios):
return "sine"
odd_only = all(ratios[i] < 0.05 for i in [0, 2, 4])
if odd_only and ratios[1] > 0.05:
return "square" if ratios[1] > 0.3 else "triangle"
if all(r > 0.01 for r in ratios[:4]):
return "sawtooth"
return "wavetable"
For broadband signals with no clear fundamental, classify noise color by spectral slope (power vs frequency on log-log scale):
| Slope (dB/octave) | Color | Maps to |
|---|---|---|
| ~0 (flat) | white | source: { type: "noise", color: "white" } |
| ~-3 | pink | source: { type: "noise", color: "pink" } |
| ~-6 | brown | source: { type: "noise", color: "brown" } |
Analyze spectral rolloff to detect filtering. Compare the measured spectrum against the expected spectrum for the identified oscillator type:
| Observation | Filter type | Maps to |
|---|---|---|
| High-frequency rolloff steeper than source would produce | lowpass | filter: { type: "lowpass", frequency: <cutoff_hz> } |
| Low-frequency rolloff | highpass | filter: { type: "highpass", frequency: <cutoff_hz> } |
| Narrow frequency band passes through | bandpass | filter: { type: "bandpass", frequency: <center_hz> } |
| Narrow frequency notch removed | notch | filter: { type: "notch", frequency: <notch_hz> } |
| Resonant peak near cutoff | High Q | filter: { ..., resonance: <Q_value> } |
Estimate cutoff frequency as the point where amplitude drops 3dB below the expected level. If the spectral brightness changes over time (bright attack fading to dull), this indicates a filter envelope: