Quick Reference

DIALOGUE:       -12 dB peak  |  -16 to -14 LUFS integrated
MUSIC BED:      -30 to -20 dB (18-20 dB below dialogue)
SFX:            -18 to -12 dB (6 dB below dialogue minimum)
WHOOSH TIMING:  Start 10-20ms before visual, duration 400-500ms
MUSIC BPM:      Calm 60-80 | Standard 90-110 | Upbeat 120-140
TRUE PEAK:      Never exceed -1.5 dBTP
VOICE EQ:       HPF 80Hz, cut 500Hz, boost 2-5kHz, cut 6-8kHz
VOICE COMP:     3:1 ratio, 1-5ms attack, 10-20ms release
TARGET LUFS:    -14 LUFS (YouTube/TikTok/IG) | -16 LUFS (podcasts)

Audio Ducking Levels

Element	Peak Level	Notes
Dialogue / Narration	-6 dB to -12 dB	Primary element
Background music (during speech)	-18 dB to -20 dB	18-20 dB below dialogue
Sound effects	-12 dB to -18 dB

Content Type	BPM Range	Mood
Calm explainer / tutorial	60-80	Contemplative, focused
Corporate / testimonial	60-100	Professional, calm
Standard explainer	90-110	Steady, engaging
Upbeat promo	110-130	Enthusiastic
High-energy / demo	120-140	Exciting, dynamic
Action / fast-paced	140-200	Adrenaline

SFX Type	Use Case	Duration	Level
Whoosh / Swish	Scene transitions	400-500ms	-18 to -12 dB
Pop / Pluck	Text appearing, bullet points	<200ms	-15 to -12 dB
Click / Tap	UI interactions	<100ms	-20 to -15 dB
Riser / Swell	Building to a reveal	1-3s	-18 to -12 dB
Impact / Hit	Key reveal, stat	<300ms	-12 to -6 dB
Subtle whoosh	Element sliding in/out	200-400ms	-20 to -15 dB

ffmpeg -i input.mp4 -af loudnorm=I=-14:LRA=11:TP=-1 -c:v copy output.mp4

ffmpeg -i narration.wav -i music.wav -filter_complex \
  "[1:a]asplit=2[music1][music2]; \
   [0:a][music2]sidechaincompress=threshold=0.02:ratio=9:attack=200:release=500[ducked]; \
   [music1][ducked]amix=inputs=2:weights='1 0.15'" \
  -c:a aac output.m4a

ffmpeg -i input.mp4 -af loudnorm=print_format=json -f null - 2>&1 | grep -A 20 "Parsed_loudnorm"

Task	Kolbo MCP Tool	Notes
Generate narration	`generate_speech`	See `list_voices` for voice options
Generate music	`generate_music`	Use BPM tables above, always instrumental=true
Generate SFX	`generate_sound`	Describe physically: "door slam in stone hallway"
Transcribe audio	`transcribe_audio`	Word-level timestamps for sync
Voice discovery	`list_voices`	Filter by language, gender, provider

Sound Design for Video Production | Skills Pool

Platform	Integrated LUFS	True Peak
YouTube	-14 LUFS	-1 dBTP
TikTok	-14 LUFS	-1 dBTP
Instagram Reels	-14 LUFS	-1 dBTP
Spotify (podcast)	-14 LUFS	-1 dBTP
Apple Podcasts	-16 LUFS	-1 dBTP
Broadcast TV	-24 LUFS	-2 dBTP

Sound Design for Video Production

Sound Design for Video Production

Quick Reference

Audio Ducking Levels

Music Selection by Content Type

Sound Effects (SFX) Placement

Timing Rules

Platform Loudness Targets

Voice Processing Chain

FFmpeg Audio Commands

Loudness Normalization

Audio Ducking with Sidechain

Measure Loudness

Kolbo MCP Integration

Local / Free Options

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api