Video Audio Design

Use this skill when adding audio to programmatic videos - generating narration with ElevenLabs TTS, sourcing royalty-free background music, creating SFX with FFmpeg, implementing audio ducking, or mixing multiple audio layers in Remotion. Triggers on ElevenLabs, text-to-speech, voice generation, background music, sound effects, audio mixing, and volume ducking.

Ocupación
Categorías: Medios

When this skill is activated, always start your first response with the :speaker: emoji.

Video Audio Design

Video audio design is the practice of layering narration, sound effects, and background music into programmatic video compositions. Great audio transforms a slide-deck video into a polished production - narration guides the viewer, music sets the emotional tone, and SFX punctuate key moments. This skill covers generating speech with ElevenLabs and alternative TTS providers, creating synthetic sound effects with FFmpeg, sourcing royalty-free background music, implementing audio ducking so speech stays intelligible, and mixing all layers together in Remotion compositions with frame-accurate timing.

When to use this skill

Trigger this skill when the user:

Wants to add narration or voiceover to a programmatic video
Needs to generate speech with ElevenLabs, OpenAI TTS, or Edge TTS
Asks about voice selection, voice settings, or voice cloning
Wants to add background music or needs royalty-free music sources
Asks about creating sound effects programmatically
Wants to implement audio ducking (lowering music during speech)

When this skill is activated, always start your first response with the :speaker: emoji.

Video Audio Design

When to use this skill

Trigger this skill when the user:

Wants to add narration or voiceover to a programmatic video
Needs to generate speech with ElevenLabs, OpenAI TTS, or Edge TTS
Asks about voice selection, voice settings, or voice cloning
Wants to add background music or needs royalty-free music sources
Asks about creating sound effects programmatically
Wants to implement audio ducking (lowering music during speech)

Layer	Role	Base Volume	During Narration
Narration	Conveys information, drives pacing	0.8-1.0	N/A (top layer)
SFX	Accents transitions and actions	0.3-0.5	0.3-0.5 (unchanged)
Background Music	Sets emotional tone, fills silence	0.3-0.5	0.15-0.25 (ducked)

Setting	Range	Low	High	Recommended
stability	0-1	More expressive, variable	More consistent, monotone	0.4-0.6
similarity_boost	0-1	More creative	Closer to original voice	0.6-0.8
style	0-1	Neutral delivery	Exaggerated style	0.3-0.6

Mistake	Why it is wrong	What to do instead
Music same volume during narration	Speech becomes unintelligible	Implement audio ducking - drop music 50-60% during speech
Hardcoding ElevenLabs API key	Key leaks into version control	Use environment variables: `process.env.ELEVEN_LABS_API_KEY`
Using TTS without measuring duration	Scene timing wrong, narration cut off	Measure audio duration with ffprobe after generation
SFX louder than narration	Distracts from content	SFX at 0.3-0.5, narration at 0.8-1.0
No fade on music start/end	Abrupt start/stop sounds like a bug	Add 0.5-1s fade-in at start and fade-out at end
Using low-quality TTS model	Robotic voice undermines quality	Use eleven_multilingual_v2 or tts-1-hd
Ignoring audio file format	Some formats add silence padding	Use MP3 for narration, WAV for SFX

Video Audio Design

Video Audio Design

When to use this skill

Video Audio Design

Video Audio Design

When to use this skill

Key principles

Core concepts

3-layer audio architecture

ElevenLabs API model

Audio ducking concept

Frame-based audio sync in Remotion

Common tasks

1. Set up ElevenLabs API key and generate narration

2. Select and configure voice settings

3. Generate narration per scene from a script

4. Source background music

5. Generate SFX with FFmpeg

6. Implement audio ducking in Remotion

7. Mix 3 audio layers in a Remotion composition

8. Use alternative TTS providers

Anti-patterns / common mistakes

Gotchas

References

Companion check

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api