Produce delivery-ready TTS audio for video tasks. Use when asked to generate narration or voiceover, choose a TTS engine, clean artifacts, normalize loudness, and export segment-ready audio files.
Choose engine by need:
espeak-ng for fast local iteration.scripts/chatterbox_tts.js for Replicate Chatterbox.Processing goals:
Recommended workflow:
Engine selection guidance:
espeak-ng for drafts, timing tests, and rapid iteration where quality is less important.--audio-ref in Chatterbox when you need style/voice steering; omit it for generic delivery.Prototyping example:
espeak-ng -v en-us -s 165 -w draft.wav "Your script text"
Cloud example (requires REPLICATE_API_TOKEN):
REPLICATE_API_TOKEN=<token> node .agents/skills/generating-voiceover/scripts/chatterbox_tts.js --prompt "Your script text" --output narration.wav --audio-ref ref.wav
Cleanup guidance:
Use consistent cleanup filters across all segments. Recommended FFmpeg chain:
ffmpeg -y -i raw.wav -af "highpass=f=20,lowpass=f=16000,afade=t=in:st=0:d=0.05,areverse,afade=t=in:st=0:d=0.05,areverse" cleaned.wav
If low-pass is not needed, remove lowpass=f=16000.
Loudness guidance (BS.1770 / EBU R128 style):
-23 LUFS, true peak around -1.5 dBTP, and optional LRA=11.Measure loudness (EBU R128 / BS.1770 style):
ffmpeg -hide_banner -i cleaned.wav -filter_complex ebur128 -f null -
Normalize as the final step:
ffmpeg -y -i cleaned.wav -af "loudnorm=I=-23:TP=-1.5:LRA=11:print_format=summary" normalized.wav
Export delivery format (typical video-ready WAV):
ffmpeg -y -i normalized.wav -ar 48000 -ac 1 -c:a pcm_s16le delivery.wav
Final QC checklist:
If tempo or duration changes happen after normalization, run normalization again.
Log to {project_dir}/logs/production.jsonl. See skills/lib/logging-guide.md for schema.
On invocation — key inputs: script_path, voice_style
On completion — key outputs: audio_path, duration_s, lufs (normalized loudness)