Name: Audio Tts
Author: second-state

搵技能.../

export LD_LIBRARY_PATH={baseDir}/scripts/libtorch/lib:$LD_LIBRARY_PATH

{baseDir}/scripts/tts \
  {baseDir}/scripts/models/Qwen3-TTS-12Hz-0.6B-CustomVoice \
  "<text>" \
  <speaker> \
  <language>

{baseDir}/scripts/tts \
  {baseDir}/scripts/models/Qwen3-TTS-12Hz-0.6B-CustomVoice \
  "Hello! Welcome to the Qwen3 text-to-speech system." \
  Vivian \
  english

{baseDir}/scripts/voice_clone \
  {baseDir}/scripts/models/Qwen3-TTS-12Hz-0.6B-Base \
  <reference_audio.wav> \
  "<text>" \
  <language> \
  "<reference_text>"

Parameter	Required	Description
model_path	Yes	Path to the Base model directory
reference_audio	Yes	Path to reference WAV file (mono 24kHz 16-bit)
text	Yes	The text to synthesize in the cloned voice
language	Yes	`english` or `chinese`
reference_text	Yes	Transcript of the reference audio

ffmpeg -i input.m4a -ac 1 -ar 24000 -sample_fmt s16 reference.wav

{baseDir}/scripts/voice_clone \
  {baseDir}/scripts/models/Qwen3-TTS-12Hz-0.6B-Base \
  reference.wav \
  "This is a voice cloning test with in-context learning." \
  english \
  "The transcript of what was said in the reference audio."

# Read the transcript
REF_TEXT=$(cat {baseDir}/scripts/reference_audio/trump.txt)

# Run voice clone with ICL mode
{baseDir}/scripts/voice_clone \
  {baseDir}/scripts/models/Qwen3-TTS-12Hz-0.6B-Base \
  {baseDir}/scripts/reference_audio/trump.wav \
  "Hello world" \
  english \
  "$REF_TEXT"

Audio Tts | Skills Pool

Audio Tts

Audio Tts

Qwen3 TTS — Text-to-Speech and Voice Cloning

Binaries

Models

Reference Audio

When to Use Which Tool

Linux Environment Setup

Text-to-Speech

Parameters

Available Speakers

Output

Example

Voice Cloning (ICL Mode)

Parameters

Reference Audio Requirements

Output

Example

Workflow

1. Determine the Task

2. Prepare Input

3. Run the Command

Example: Clone by Speaker Name

4. Return the Output

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Parameter	Required	Description
model_path	Yes	Path to the model directory
text	Yes	The text to synthesize as speech
speaker	Yes	Speaker name (see Available Speakers below)
language	Yes	`english` or `chinese`