When to Use

User wants to convert text to spoken audio
User asks for "read aloud", "TTS", "text to speech", "voice narration"
User says "朗读", "配音", "语音合成"
User wants multi-speaker scripted audio or dialogue

When NOT to Use

User wants a podcast-style discussion with topic exploration (use /podcast)
User wants an explainer video with visuals (use /explainer)
User wants to generate an image (use /image-gen)

Purpose

Convert text into natural-sounding speech audio. Two paths:

Quick mode (--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio.
Script mode (--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.

Hard Constraints

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation CLI command until the user has explicitly confirmed. </HARD-GATE>

When to Use

User wants to convert text to spoken audio
User asks for "read aloud", "TTS", "text to speech", "voice narration"
User says "朗读", "配音", "语音合成"
User wants multi-speaker scripted audio or dialogue

When NOT to Use

User wants a podcast-style discussion with topic exploration (use /podcast)
User wants an explainer video with visuals (use /explainer)
User wants to generate an image (use /image-gen)

Purpose

Convert text into natural-sounding speech audio. Two paths:

Quick mode (--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio.
Script mode (--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.

Signal	Mode
"多角色", "脚本", "对话", "script", "dialogue", "multi-speaker"	Script
Multiple characters mentioned by name or role	Script
Input contains structured segments (A: ..., B: ...)	Script
Single paragraph of text, no character markers	Quick
"读一下", "read this", "TTS", "朗读" with plain text	Quick
Ambiguous	Quick (default)

Tts

When to Use

When NOT to Use

Purpose

Hard Constraints

Tts

When to Use

When NOT to Use

Purpose

Hard Constraints

Mode Detection

Interaction Flow

Step -1: CLI Auth Check

Step 0: Config Setup

Setup Flow (user-initiated reconfigure only)

Quick Mode — `listenhub tts create --mode direct`

Script Mode — `listenhub tts create --mode smart`

Updating Config

API Reference

Composability

Examples

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Tts

When to Use

When NOT to Use

Purpose

Hard Constraints

Tts

When to Use

When NOT to Use

Purpose

Hard Constraints

Mode Detection

Interaction Flow

Step -1: CLI Auth Check

Step 0: Config Setup

Setup Flow (user-initiated reconfigure only)

Quick Mode — listenhub tts create --mode direct

Script Mode — listenhub tts create --mode smart

Updating Config

API Reference

Composability

Examples

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Quick Mode — `listenhub tts create --mode direct`

Script Mode — `listenhub tts create --mode smart`