Text-to-speech and voice narration. Triggers on: "朗读这段", "配音", "TTS", "语音合成", "text to speech", "read this aloud", "convert to speech", "voice narration", "read aloud".
/podcast)/explainer)/image-gen)Convert text into natural-sounding speech audio. Two paths:
--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio.--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.shared/cli-authentication.mdshared/cli-patterns.md for CLI execution, errors, and interaction patternsshared/speaker-selection.md as fallback only; fetch from the speakers CLI when the user wants to change voiceshared/config-pattern.md before any interactionshared/speaker-selection.md for speaker selection (text table + free-text input)~/Downloads/ or /tmp/ as primary output — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)Determine the mode from the user's input automatically before asking any questions:
| Signal | Mode |
|---|---|
| "多角色", "脚本", "对话", "script", "dialogue", "multi-speaker" | Script |
| Multiple characters mentioned by name or role | Script |
| Input contains structured segments (A: ..., B: ...) | Script |
| Single paragraph of text, no character markers | Quick |
| "读一下", "read this", "TTS", "朗读" with plain text | Quick |
| Ambiguous | Quick (default) |
Follow shared/cli-authentication.md. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/tts"
echo '{"outputMode":"inline","language":null,"defaultSpeakers":{}}' > ".listenhub/tts/config.json"
CONFIG_PATH=".listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/tts/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/tts/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (tts):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认主播:{speakerName / 使用内置默认}
Then ask:
outputMode: Follow shared/output-mode.md § Setup Flow Question.
Language (optional): "默认语言?"
nullAfter collecting answers, save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
# Save language if user chose one (not "每次手动选择")
if [ "$LANGUAGE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
listenhub tts create --mode directStep 1: Extract text
Get the text to convert. If the user hasn't provided it, ask:
"What text would you like me to read aloud?"
Step 2: Determine voice
config.defaultSpeakers.{language}[0] is set → use it silently (skip to Step 4)shared/speaker-selection.md for the detected language (skip to Step 4)Step 3: Save preference
After the user explicitly selects a new voice (not when using defaults):
Question: "Save {voice name} as your default voice for {language}?"
Options:
- "Yes" — update .listenhub/tts/config.json
- "No" — use for this session only
Step 4: Confirm
Ready to generate:
Text: "{first 80 chars}..."
Voice: {voice name}
Proceed?
Step 5: Generate
For short text, pass inline:
RESULT=$(listenhub tts create --text "{text}" --mode direct --speaker "{name}" --lang {lang} --json 2>/tmp/lh-err)
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
ERROR=$(cat /tmp/lh-err)
case $EXIT_CODE in
2) echo "Auth error: run 'listenhub auth login'" ;;
3) echo "Timeout: try --no-wait" ;;
*) echo "Error: $ERROR" ;;
esac
rm -f /tmp/lh-err
fi
rm -f /tmp/lh-err
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
For long text, write to a temp file first (see shared/cli-patterns.md § Long Text Input):
cat > /tmp/lh-content.txt << 'ENDCONTENT'
Long text content goes here...
ENDCONTENT
RESULT=$(listenhub tts create --text "$(cat /tmp/lh-content.txt)" --mode direct --speaker "{name}" --lang {lang} --json)
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
rm -f /tmp/lh-content.txt
Step 6: Present result
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Display the audioUrl as a clickable link.
Present:
Audio generated!
在线收听:{audioUrl}
download or both: Also download the file. Generate a topic slug from the text content following shared/config-pattern.md § Artifact Naming.
SLUG="{topic-slug}" # e.g. "server-maintenance-notice"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "$AUDIO_URL"
Present:
Audio generated!
已保存到当前目录:
{NAME}
listenhub tts create --mode smartStep 1: Get scripts
Determine whether the user already has a scripts array:
Already provided (JSON or clear segments): parse and display for confirmation
Not yet provided: help the user structure segments. Ask:
"Please provide the script with speaker assignments. Format: each line as
SpeakerName: text content. I'll convert it."
Once the user provides the script, parse it into speaker-annotated text.
Step 2: Assign voices per character
For each unique character in the script:
config.defaultSpeakers.{language} has saved voices → auto-assign silently (one per character in order)shared/speaker-selection.md (Primary for first character, Secondary for second)Step 3: Save preferences
After all voices are assigned (if any were new):
Question: "Save these voice assignments for future sessions?"
Options:
- "Yes" — update defaultSpeakers in .listenhub/tts/config.json
- "No" — use for this session only
Step 4: Confirm
Ready to generate:
Characters:
{name}: {voice}
{name}: {voice}
Segments: {count}
Title: (auto-generated)
Proceed?
Step 5: Generate
Format the script text with speaker markers and submit. For multi-speaker scripts, include speaker names inline in the text. Run with run_in_background: true since script mode may take longer.
Submit (foreground) with --no-wait:
RESULT=$(listenhub tts create --text "{formatted script with speaker markers}" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json)
ID=$(echo "$RESULT" | jq -r '.id')
echo "Submitted: $ID"
For long scripts, write to a temp file first:
cat > /tmp/lh-content.txt << 'ENDCONTENT'
SpeakerA: First line of dialogue
SpeakerB: Second line of dialogue
...
ENDCONTENT
RESULT=$(listenhub tts create --text "$(cat /tmp/lh-content.txt)" --mode smart --speaker "{name1}" --speaker "{name2}" --lang {lang} --no-wait --json)
ID=$(echo "$RESULT" | jq -r '.id')
rm -f /tmp/lh-content.txt
Poll (background) with run_in_background: true and timeout: 600000:
ID="<id-from-above>"
for i in $(seq 1 60); do
RESULT=$(listenhub creation get "$ID" --json 2>/dev/null)
STATUS=$(echo "$RESULT" | jq -r '.status // "processing"')
case "$STATUS" in
completed) echo "$RESULT"; exit 0 ;;
failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
*) sleep 10 ;;
esac
done
echo "TIMEOUT" >&2; exit 2
Step 6: Present result
When the background task completes, parse the result:
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
SUBTITLES_URL=$(echo "$RESULT" | jq -r '.subtitlesUrl // empty')
DURATION=$(echo "$RESULT" | jq -r '.audioDuration // empty')
CREDITS=$(echo "$RESULT" | jq -r '.credits // empty')
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Display the audioUrl and subtitlesUrl as clickable links.
Present:
Audio generated!
在线收听:{audioUrl}
字幕:{subtitlesUrl}
时长:{audioDuration / 1000}s
消耗积分:{credits}
download or both: Also download the file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.
SLUG="{topic-slug}" # e.g. "welcome-dialogue"
NAME="${SLUG}.mp3"
# Dedup: if file exists, append -2, -3, etc.
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "$AUDIO_URL"
Present:
已保存到当前目录:
{NAME}
When saving preferences, merge into .listenhub/tts/config.json — do not overwrite unchanged keys.
defaultSpeakers.{language}[0] to the selected speakerIddefaultSpeakers.{language} to the full array assigned this sessionlanguage if the user explicitly specifies itshared/cli-patterns.mdshared/cli-authentication.mdshared/cli-speakers.mdshared/speaker-selection.mdshared/config-pattern.mdshared/output-mode.mdQuick mode:
"TTS this: The server will be down for maintenance at midnight."
defaultSpeakers.en is emptycozy-man-english)RESULT=$(listenhub tts create --text "The server will be down for maintenance at midnight." --mode direct --speaker "Mars" --lang en --json)
AUDIO_URL=$(echo "$RESULT" | jq -r '.audioUrl')
audioUrl as link (inline mode)Script mode:
"帮我做一段双人对话配音,A说:欢迎大家,B说:谢谢邀请"
defaultSpeakers.zh emptyRESULT=$(listenhub tts create --text "A: 欢迎大家
B: 谢谢邀请" --mode smart --speaker "原野" --speaker "高晴" --lang zh --no-wait --json)
ID=$(echo "$RESULT" | jq -r '.id')
audioUrl, subtitlesUrl, duration