Multi-speaker podcast and dialogue generation with TTS voice cloning. Triggers: podcast, 播客, multi-speaker audio, 多人对话, 多人语音, radio show, talk show, 对话生成, 锵锵三人行, 两人对话, 三人对话, voice dialogue, 用XX的声音, 用XX和XX的风格, 声音模仿, 角色对话, 配音.
Generate professional multi-speaker podcasts from a topic or text. The pipeline collects speaker preferences, expands content into a scripted dialogue with emotion and music cues, lets you review before generation, then produces a final MP3.
[Speaker - voice, emotion] formatPreset voices (built-in): vivian (default), serena, ryan, aiden, eric, dylan, uncle_fu, ono_anna, sohee
Custom/cloned voices: use fm_voice_save in mofa-fm to save cloned voices first, then reference them by name.
The generated script uses this format:
# My Podcast Title
**Genre**: talk-show | **Duration**: ~10 min | **Speakers**: 3
| Character | Voice | Type |
|-----------|-------|------|
| Host | vivian | built-in |
| Guest1 | ryan | built-in |
| Expert | clone:sarah | clone |
---
[BGM: Upbeat intro music — fade-in, 5s]
[Host - vivian, cheerful] Welcome to today's show!
[Guest1 - ryan, excited] Thanks for having me!
[BGM: Soft transition — crossfade, 3s]
[Expert - clone:sarah, serious] Let me share some insights...
[PAUSE: 2s]
[Host - vivian, warm] That's fascinating. Let's dig deeper...
[BGM: Outro music — fade-out, 5s]
Supported emotions (mapped to TTS style prompts):
calm — natural, composed toneexcited — energetic, enthusiasticserious — formal, weightywarm — friendly, invitingangry — intense, forcefulsad — somber, reflectivecheerful — upbeat, positivedramatic — theatrical, intensecurious — inquisitive, wonderingthoughtful — contemplative, measuredBackground music placeholders — actual music files are mixed in post-production:
[BGM: description — fade-in, Ns] — music fades in over N seconds[BGM: description — fade-out, Ns] — music fades out[BGM: description — crossfade, Ns] — crossfade transition[PAUSE: Ns] — insert N seconds of silence (1-3s typical)The podcast_generate tool:
[Character - voice, emotion] text linesseg_001, seg_002, ...)seg_{NNN}_{voice}.wav files inside the output segments/ directory[PAUSE] cuesskill-output/mofa-podcast/script.mdskill-output/mofa-podcast/segments/*.wavskill-output/mofa-podcast/podcast_<timestamp>.mp3 (or .wav fallback if MP3 conversion is unavailable)