Generate speech audio for multiple lines with per-line emotion control.
Args: segments: List of dicts with fields text, emotion, and emotion_sample (all required); Optionally include ref_emotion_voice per segment as a local audio file path or remote URL; when omitted/empty, defaults to ref_voice. ref_voice: Local audio file path or remote URL to use as reference for voice characteristics and timbre
Returns: List of generated speech audio segments as HTTP URLs (http://host:port/file/path) in WAV format
Example:
segments = [
{
"text": "Hi there! It is great to meet you.",
"emotion": "happy",
"emotion_sample": "I am so glad we finally get to meet in person!",
"ref_emotion_voice": "/path/to/emotion_voice.wav"
},
{
"text": "This is serious, so please pay attention.",
"emotion": "serious",
"emotion_sample": "This is serious, so please pay attention.",
"ref_emotion_voice": "/path/to/emotion_voice.wav"
}
]
audio = await text_segments_to_speech(segments, ref_voice="/path/to/reference_voice.wav")
Call the local MCP bridge shell wrapper:
core/bin/tool-cli request '{"server_id": "media", "tool_name": "text_segments_to_speech", "arguments": {}}'
Do not use any Python helper code to invoke the core/bin/tool-cli command. Run as shell command with arguments directly.
{
"additionalProperties": false,
"properties": {
"segments": {
"items": {
"additionalProperties": true,
"type": "object"
},
"type": "array"
},
"ref_voice": {
"default": "",
"type": "string"
}
},
"required": [
"segments"
],
"type": "object"
}