Practical mastering steps for TTS audio: cleanup, loudness normalization, alignment, and delivery specs.
This skill focuses on producing clean, consistent, and delivery-ready TTS audio for video tasks. It covers speech cleanup, loudness normalization, segment boundaries, and export specs.
Choose a TTS engine based on deployment constraints and quality needs:
Key rule: Always confirm the native sample rate of the generated audio before resampling for video delivery.
Apply lightweight processing to avoid common artifacts:
Recommended FFmpeg pattern (example):
Target loudness depends on the benchmark/task spec. A common target is ITU-R BS.1770 loudness measurement:
Recommended workflow:
ebur128 (or equivalent meter).loudnorm) as the final step after cleanup and timing edits.When stitching segment-level TTS into a full track:
Sync guideline: keep end-to-end drift small (e.g., <= 0.2s) unless the task states otherwise.