Name: Text To Speech
Author: fabioc-aloha

搵技能.../

State	Display	Click Action
Connecting	`$(loading~spin) Connecting...`	-
Synthesizing	`$(loading~spin) Synthesizing...`	-
Streaming	`$(loading~spin) Receiving... 45KB`	-
Playing	`$(unmute) Playing 35%`	Stop
Paused	`$(unmute) Paused`	Stop

┌─────────────────────────────────────────────────────────┐
│  Alex TTS Player                                    [×] │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ▶️ ⏹️   ═══════════●══════════   1:23 / 4:56          │
│                                                         │
│  🔊 ────────●────────                                   │
│                                                         │
└─────────────────────────────────────────────────────────┘

Voice	Character	Best For
Default (GuyNeural)	Professional, clear	Technical docs, code review
Warm (ChristopherNeural)	Friendly, conversational	Tutorials, READMEs
British (RyanNeural)	Authoritative	Formal documents, presentations
Friendly (DavisNeural)	Casual, approachable	Chat logs, informal content

┌─────────────────────────────────────────────────────────────┐
│                 Alex VS Code Extension                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Commands:                                                   │
│  • Alex: Read Aloud (Ctrl+Alt+R)                            │
│  • Alex: Read with Voice Selection                          │
│  • Alex: Save as Audio                                      │
│  • Alex: Stop Reading                                       │
│                     │                                        │
│                     ▼                                        │
│  ┌─────────────────────────────────────────────┐            │
│  │           ttsService.ts                       │            │
│  │   Native WebSocket to Edge TTS               │            │
│  │   • SSML generation                          │            │
│  │   • Markdown stripping                       │            │
│  │   • Progress callbacks                       │            │
│  └─────────────────┬───────────────────────────┘            │
│                    │                                         │
│                    ▼                                         │
│  ┌─────────────────────────────────────────────┐            │
│  │           audioPlayer.ts                      │            │
│  │   Webview-based playback                     │            │
│  │   • Cross-platform HTML5 Audio               │            │
│  │   • Play/pause/stop controls                 │            │
│  │   • Progress tracking                        │            │
│  └─────────────────────────────────────────────┘            │
│                                                              │
└──────────────────────┬──────────────────────────────────────┘
                       │ WebSocket (wss://)
                       ▼
┌─────────────────────────────────────────────────────────────┐
│               Microsoft Edge TTS Endpoint                    │
│   wss://speech.platform.bing.com/consumer/speech/...        │
├─────────────────────────────────────────────────────────────┤
│  • 400+ neural voices, 90+ languages                        │
│  • Free, no API key required                                │
│  • MP3 output (24kHz, 48kbps)                               │
│  • SSML support for prosody control                         │
└─────────────────────────────────────────────────────────────┘

# Basic usage
say "Hello from Alex"

# Read a file aloud
say -f document.txt

# Save to audio file (AIFF)
say -o output.aiff "Synthesis complete"

# Save as AAC (smaller file)
say -o output.m4a --data-format=aac "Dream state finished"

# Choose a voice (the macOS "Alex" voice is a fun coincidence)
say -v Alex "I am Alex, reading your documentation"

# List all installed voices
say -v '?'

Feature	Edge TTS (primary)	macOS `say` (fallback)
Quality	400+ neural voices	30+ system voices
Cost	Free	Free
Offline	No (WebSocket)	Yes
Output formats	MP3	AIFF, AAC, WAV
Languages	90+	~20
Best for	Production audio, MP3 export	Quick reads, notifications

# After brain-qa completes
node .github/muscles/brain-qa.cjs --mode quick && say "Brain QA complete"

# After dream state
node .github/muscles/brain-qa.cjs --mode all && say "Dream maintenance finished"

Preset	Voice ID	Character
Default	en-US-GuyNeural	Professional male, clear articulation
Warm	en-US-ChristopherNeural	Friendly, conversational
British	en-GB-RyanNeural	British accent, authoritative
Friendly	en-US-DavisNeural	Casual, approachable

```python
def hello():
    print("Hello")


Becomes: "Python code block. Definition hello. Print hello. End code block."

### Symbol-to-Speech Transformations

Symbols are converted to natural speech equivalents:

| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min → "about 2 minutes" |
| `&` | "and" | A & B → "A and B" |
| `@` | "at" | user@email → "user at email" |
| `%` | "percent" | 50% → "50 percent" |
| `+` | "plus" | +10% → "plus 10 percent" |
| `→` | "leads to" or "becomes" | A → B → "A becomes B" |
| `—` | (pause) | word—word → "word (pause) word" |
| `#` | (context-dependent) | #1 → "number 1"; ## → (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 → "x greater than 5" |
| `≥` / `≤` | "greater than or equal" / "less than or equal" | |
| `µ` | "micro" | µg → "microgram" |
| `°` | "degrees" | 37°C → "37 degrees celsius" |
| `±` | "plus or minus" | ±5% → "plus or minus 5 percent" |

### Time Duration Patterns (v2.1.0)

| Input | Spoken As |
|-------|----------|
| `4h` | "4 hours" |
| `30m` | "30 minutes" |
| `15s` | "15 seconds" |
| `2d` | "2 days" |
| `1w` | "1 week" |
| `90min` | "90 minutes" |

### Emoji Pronunciation (v2.1.0)

| Emoji | Spoken As | Context |
|-------|-----------|--------|
| ✅ | "completed" | Status indicators |
| ❌ | "not done" | Status indicators |
| ⚠️ | "warning" | Alerts |
| 📋 | "planned" | Task status |
| 🔄 | "in progress" | Task status |
| ⏳ | "waiting" | Task status |
| 🔥 | "hot" or "high priority" | When followed by "High" |
| 🔓 | "unlocked" | Feature status |
| 💡 | "idea" | Suggestions |
| 🆕 | "new" | Version notes |

**Emoji-Text Deduplication**: When emoji meaning matches following text (e.g., `✅ Complete`), only says it once ("completed", not "completed Complete").

### Table Reading (v2.1.0)

Markdown tables are converted to natural speech:

```markdown
| Name  | Status    |
|-------|----------|
| Alice | ✅ Done   |
| Bob   | 🔄 Active |

This document is approximately 32 minutes long (~4800 words).
Would you like to:
- Read full content (~32 min)
- Summarize for speech (~3 min) ← Recommended

const SPEAKER_WARMUP_MS = 2000;
// Status shows "Preparing speakers..." during delay

Press Ctrl+Alt+R to read document aloud
Select text first to read only selection

Command Palette → "Alex: Save as Audio"
Choose output location → MP3 saved

Command Palette → "Alex: Read with Voice Selection"
Choose: Default | Warm | British | Friendly

Text To Speech | Skills Pool

You Write	Alex Reads
`# Heading`	"Heading." (pause)
`bold text`	"bold text" (slight emphasis)
`[link text](url)`	"link text"
`code`	"code"
`> blockquote`	"Quote: ..."
`---`	(long pause)

Symbol	Spoken As
`~5 minutes`	"about 5 minutes"
`50%`	"50 percent"
`A → B`	"A leads to B"
`±5%`	"plus or minus 5 percent"

File	Purpose
`ttsService.ts`	WebSocket connection, SSML generation, synthesis
`audioPlayer.ts`	Webview panel, playback controls, system fallback
`index.ts`	Module exports

Markdown	Speech Output
`# Heading`	"Heading." (pause)
`bold`	"bold" (emphasis via prosody)
`italic`	"italic"
`code`	"code"
`[link]\(url\)`	"link"
`- item`	"Item."
`> quote`	"Quote: ..."
`---`	(long pause)

Input	Spoken As	Why
`v4.2.9`	"version 4.2.9"	Standalone version
`Version: v4.2.9`	"Version: 4.2.9"	Already has "Version:" prefix

Setting	Value	Rationale
`MAX_CHUNK_CHARS`	3000	Safe limit before Edge TTS stalls
`CHUNK_TIMEOUT_MS`	60000	60 seconds per chunk
`MAX_RETRIES`	3	Retry failed chunks

Attempt	Delay	Formula
1	~1s	`1000 + jitter`
2	~2s	`2000 + jitter`
3	~4s	`4000 + jitter`

Text To Speech

Text To Speech

Text-to-Speech Skill ⭐ Flagship

Why This is a Flagship Skill

User Experience

🎯 Quick Start: Read Any Document

📊 Status Bar Feedback

🎵 Webview Audio Player

🎤 Voice Selection

💾 Save as MP3

⏹️ Stop Reading

📝 Smart Markdown Processing

For Master Alex (Promotion Notes)

Overview

Architecture (v2.0)

macOS Offline Fallback: say

Alex Voice Presets

Voice Selection Rationale

VS Code Commands

Alex: Read Aloud

Alex: Read with Voice Selection

Alex: Save as Audio

Alex: Stop Reading

Implementation Details

Core Files (src/tts/)

Text Preprocessing

Code Block Handling

Version Pattern Intelligence (v2.1.0)

Reliability & Long Content Handling (v2.1.0)

The Problem

The Solution: Chunking with Retry

Long Content Summarization

Speaker Warmup Delay

Installation (v2.0)

Package Dependencies

Verification

Usage Patterns

Read Current Document

Generate Audio File

Voice Customization

Edge TTS Technical Reference

WebSocket Endpoint

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing

macOS Offline Fallback: `say`