Name: Comfyui Voice Pipeline
Author: MCKRUZ

Creates character voices through TTS/voice cloning and synchronizes them with generated video.

Voice Generation Decision Tree

VOICE REQUEST
    |
    |-- Have reference audio of target voice?
    |   |-- Yes (5+ seconds) → Chatterbox (MIT, paralinguistic tags)
    |   |-- Yes (10-15 seconds) → F5-TTS (fastest zero-shot)
    |   |-- Yes (10+ minutes) → RVC training (highest fidelity)
    |   |-- Yes (any length, budget) → ElevenLabs (production quality)
    |
    |-- No reference audio?
    |   |-- Need emotion control → IndexTTS-2 (8-emotion vectors)
    |   |-- Need multi-language → TTS Audio Suite (23 languages)
    |   |-- Need voice design → ElevenLabs Voice Design (describe voice)
    |   |-- Quick prototype → Any TTS with default voice
    |
    |-- Need multi-speaker dialog?
    |   |-- Chatterbox (4 voices) or TTS Audio Suite (character switching)
    |
    |-- Need lip-sync?
    |   |-- Best accuracy → Wav2Lip + CodeFormer
    |   |-- Need head movement → SadTalker
    |   |-- Full expression control → LivePortrait
    |   |-- Unlimited length → InfiniteTalk

Tool Reference

Creates character voices through TTS/voice cloning and synchronizes them with generated video.

Voice Generation Decision Tree

VOICE REQUEST
    |
    |-- Have reference audio of target voice?
    |   |-- Yes (5+ seconds) → Chatterbox (MIT, paralinguistic tags)
    |   |-- Yes (10-15 seconds) → F5-TTS (fastest zero-shot)
    |   |-- Yes (10+ minutes) → RVC training (highest fidelity)
    |   |-- Yes (any length, budget) → ElevenLabs (production quality)
    |
    |-- No reference audio?
    |   |-- Need emotion control → IndexTTS-2 (8-emotion vectors)
    |   |-- Need multi-language → TTS Audio Suite (23 languages)
    |   |-- Need voice design → ElevenLabs Voice Design (describe voice)
    |   |-- Quick prototype → Any TTS with default voice
    |
    |-- Need multi-speaker dialog?
    |   |-- Chatterbox (4 voices) or TTS Audio Suite (character switching)
    |
    |-- Need lip-sync?
    |   |-- Best accuracy → Wav2Lip + CodeFormer
    |   |-- Need head movement → SadTalker
    |   |-- Full expression control → LivePortrait
    |   |-- Unlimited length → InfiniteTalk

Comfyui Voice Pipeline

Voice Generation Decision Tree

Tool Reference

Comfyui Voice Pipeline

Voice Generation Decision Tree

Tool Reference

Chatterbox (Recommended Open-Source)

F5-TTS

TTS Audio Suite

IndexTTS-2

RVC (Voice Conversion)

ElevenLabs (Commercial)

Voice Profile Setup

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api