스킬 파일

Story To Voice

Name: Story To Voice
Author: ziyu4huang

Analyze a story file or book directory using LLM understanding to produce high-quality .story.json with proper character voices, emotions, and pacing. Supports single stories and multi-chapter books. Then generate audio via CLI or WebUI.

ziyu4huang0 스타2026. 4. 6.

직업
카테고리: 게임 개발

스킬 내용

Story-to-Voice Skill

Convert a plain story file or multi-chapter book into structured .story.json audiobook projects, then produce audio — via CLI or WebUI.

Two Modes

Single Story Mode — argument is a .txt file:

/story-to-voice path/to/story.txt

Book Mode — argument is a directory:

/story-to-voice books/my_novel/
/story-to-voice books/my_novel/ --chapter 003

Pipeline

Single:   story.txt ──► [LLM] ──► .story.json ──► [produce] ──► .flac
Book:     books/my_novel/ ──► [LLM per chapter] ──► chapter-xxx.story.json ──► [produce-book] ──► chapter-xxx.flac
                           reads book.json for                          updates book.json
                           existing character voices                    with new characters

관련 스킬

Story To Voice | Skills Pool

Server	File	Port	Purpose
TTS Studio	`webui.py`	7860	Simple TTS generation, voice/emotion preview, AI content generation, book browser at `/books`
Story Studio	`story_studio.py`	7861	Advanced multi-segment story producer with SSE progress, segment editor, import/export

cd mlx_tts

# TTS Studio (simple TTS + book browser)
.venv/bin/python webui.py                          # http://localhost:7860

# Story Studio (advanced multi-character production)
.venv/bin/python story_studio.py                    # http://localhost:7861

Alias	Server	URL	What Opens
`open browser studio`	Story Studio	`http://localhost:7861`	Segment composer
`open browser books`	Story Studio	`http://localhost:7861/books`	Book browser
`open browser tts`	TTS Studio	`http://localhost:7860`	Simple TTS generator

Check if the server is already running: lsof -ti:<port>

If not running, start it in background:

cd /Users/huangziyu/proj/dev_mlx/mlx_tts && .venv/bin/python <server>.py

Wait for ready: sleep 3 && curl -s -o /dev/null -w "%{http_code}" http://localhost:<port>/
Verify routes: curl -s http://localhost:<port>/openapi.json | python3 -c "import sys,json; ..."
Navigate in Playwright to the URL
Take a screenshot and show the user

afplay /Users/huangziyu/proj/dev_mlx/mlx_tts/books/<book_name>/chapters/chapter-<NNN>.flac

Scene Type	Emotion	Speed	Notes
Opening narration	`storytelling`	0.97	Sets the tone
Nostalgic flashback	`calm`	0.92	Warm, reflective
Dialogue (neutral)	`neutral`	1.0	Default for conversation
Dialogue (question)	`neutral`	1.0	Curious, inquiring
Dialogue (tension)	`serious`	0.95	Conflict, urgency
Dialogue (intimate)	`whispery`	0.88	Secrets, closeness
Action / climax	`excited`	1.08	Fast, energetic
Loss / grief	`sad`	0.85	Slow, heavy
Joy / reunion	`happy`	1.05	Warm, bright
Resolution / ending	`calm`	0.90	Peaceful closure

Voice	Gender	Personality	Best For
`zm_yunjian`	Male	Deep, broadcast	Narrator, authoritative figures
`zm_yunxi`	Male	Natural, warm	Male protagonists, love interests
`zf_xiaobei`	Female	Lively, bright	Female protagonists, young women
`zf_xiaoni`	Female	Gentle, soft	Motherly figures, gentle characters

Voice	Gender	Personality	Best For
`af_heart`	Female	Warm, emotional	Female protagonists
`af_sarah`	Female	Professional	Authority figures
`af_bella`	Female	Bright, energetic	Young characters
`af_nova`	Female	Confident	Strong female leads
`am_adam`	Male	Deep, resonant	Male protagonists
`am_michael`	Male	Friendly	Casual characters
`am_echo`	Male	Dramatic	Villains, authority

Voice	Gender	Personality	Best For
`bm_george`	Male	Classic, rich	Narrator (default for EN)
`bm_lewis`	Male	Calm, steady	Supporting male
`bm_daniel`	Male	Formal	Authority figures
`bf_emma`	Female	Elegant	Female leads
`bf_isabella`	Female	Warm	Supporting female

{
  "version": "1.0",
  "title": "Story Title",
  "silence_ms": 500,
  "output_format": "flac",
  "metadata": {
    "source": "original_file.txt",
    "created": "ISO-8601 timestamp",
    "author": "",
    "language": "zh"
  },
  "segments": [
    {
      "id": "seg_1",
      "character": "Narrator",
      "text": "Once upon a time...",
      "voice": "bm_george",
      "lang": "en-gb",
      "emotion": "storytelling",
      "speed": 0.97
    }
  ]
}

cd mlx_tts && .venv/bin/python story_to_voice.py produce stories/<slug>.story.json

Key	Label	Speed Mult	Use For
`neutral`	Neutral	1.0x	Default dialogue
`happy`	Happy	1.08x	Joyful, warm moments
`excited`	Excited	1.18x	Action, surprise, energy
`sad`	Sad	0.85x	Loss, sorrow, melancholy
`calm`	Calm	0.92x	Peaceful, endings, reflection
`serious`	Serious	0.95x	Authority, tension, danger
`whispery`	Whispery	0.88x	Intimate, secrets, close
`storytelling`	Storytelling	0.97x	Narration, reading aloud

{
  "version": "1.0",
  "title": "最後一盞燈",
  "silence_ms": 500,
  "output_format": "flac",
  "metadata": {
    "source": "chinese_story.txt",
    "created": "2026-04-06T00:00:00",
    "author": "",
    "language": "zh"
  },
  "segments": [
    {
      "id": "seg_1",
      "character": "Narrator",
      "text": "那是一九四九年的冬天，上海的街道像一條冰封的河。",
      "voice": "zm_yunjian",
      "lang": "zh",
      "emotion": "storytelling",
      "speed": 0.97
    },
    {
      "id": "seg_4",
      "character": "陳懷遠",
      "text": "我明日就要去了，去一個我不知道能不能回來的地方。但在離開之前，我必須讓妳知道：那棟弄堂裡的舊書店，每天下午三點半，當陽光斜照進來的那一刻，我都在想著妳。",
      "voice": "zm_yunxi",
      "lang": "zh",
      "emotion": "sad",
      "speed": 0.85
    },
    {
      "id": "seg_18",
      "character": "林書瑤",
      "text": "就是「帶我走」三個字。",
      "voice": "zf_xiaobei",
      "lang": "zh",
      "emotion": "whispery",
      "speed": 0.85
    }
  ]
}

cd mlx_tts && .venv/bin/python story_to_voice.py parse-chapter books/<name>/ [--chapter NNN]

# All chapters:
cd mlx_tts && .venv/bin/python story_to_voice.py produce-book books/<name>/

# Specific chapter:
cd mlx_tts && .venv/bin/python story_to_voice.py produce-book books/<name>/ --chapter 003

# Re-produce already produced chapters:
cd mlx_tts && .venv/bin/python story_to_voice.py produce-book books/<name>/ --force

{
  "version": "1.0",
  "title": "Book Title",
  "author": "",
  "language": "zh",
  "genre": "literary fiction",
  "settings": {
    "silence_ms": 500,
    "output_format": "flac"
  },
  "characters": {
    "Narrator": {"gender": "male", "voice": "zm_yunjian", "lang": "zh", "role": "narrator"},
    "陳懷遠":   {"gender": "male", "voice": "zm_yunxi",    "lang": "zh", "role": "protagonist"},
    "林書瑤":   {"gender": "female", "voice": "zf_xiaobei", "lang": "zh", "role": "protagonist"}
  },
  "chapters": [
    {
      "number": 1,
      "title": "Chapter 1",
      "source": "chapter-001.txt",
      "story_json": "chapter-001.story.json",
      "audio": "chapter-001.flac",
      "status": "produced",
      "segments": 25,
      "duration_s": 182.6
    }
  ]
}

mlx_tts/
├── webui.py              # TTS Studio server (port 7860) + book browser
├── story_studio.py       # Story Studio server (port 7861) + advanced production
├── story_to_voice.py     # CLI: parse, produce, init-book, produce-book
├── book_manager.py       # BookManager CRUD for book projects
├── book_browser.html     # Book browser UI (served by both servers)
├── mlx_tts/
│   ├── generator.py      # TTS engine (Kokoro-82M on MLX)
│   ├── voices.py         # Voice catalog, languages, emotions
│   └── cli.py            # Basic CLI interface
├── books/                # Book projects directory
│   └── my_novel/
│       ├── book.json
│       └── chapters/
│           ├── chapter-001.txt
│           ├── chapter-001.story.json
│           └── chapter-001.flac
├── stories/              # Single story projects
└── outputs/              # Generated audio files

Voice	Gender	Personality	Best For
`jm_kumo`	Male	Calm	Narrator, male leads
`jf_alpha`	Female	Expressive	Female leads
`jf_gongitsune`	Female	Storyteller	Narrator, older women

Story To Voice

Story-to-Voice Skill

Two Modes

Pipeline

Story To Voice

Story-to-Voice Skill

Two Modes

Pipeline

WebUI Servers

Start servers

WebUI Key Routes

Instructions

Sub-Commands

open browser [studio|books|tts]

play <book_name> <chapter>

Single Story Mode

1. Read & Analyze the Story

2. Decompose into Segments

3. Cast Voices

4. Determine Language Code

5. Write .story.json

JSON Format

6. Produce Audio

7. Report Results

Available Emotions (Reference)

Example: Well-Crafted Chinese Story JSON

Book Mode (Multi-Chapter)

B1. Initialize or Load Book

B2. Scan for Chapters

B3. Parse Chapters

B4. Voice Consistency Rule

B5. Produce Audio

B6. Report Results

book.json Format (Reference)

Project File Structure (Reference)

Prose

Golang Patterns

Audiocraft Audio Generation

Pokemon Player

Ideation

Storybook Upgrade

`open browser [studio|books|tts]`

`play <book_name> <chapter>`