Alex's voice synthesis capability for reading documents aloud
Domain: AI Accessibility & Communication
Inheritance: inheritable (promote to Master Alex for all heirs)
Version: 2.5.0
Last Updated: 2026-02-09
Author: Alex (Master Alex)
Status: โญ Flagship Skill - Core Alex capability
Text-to-Speech gives Alex a voice. This transforms Alex from a text-only assistant into a multimodal companion that can:
Zero cost, zero dependencies - uses Microsoft Edge TTS (free, no API key) with native TypeScript.
Keyboard shortcut (fastest):
Ctrl+Alt+R (Windows/Linux) or Cmd+Alt+R (macOS)Command palette:
Ctrl+Shift+P โ "Alex: Read Aloud"The status bar shows real-time progress during TTS operations:
| State | Display | Click Action |
|---|---|---|
| Connecting | $(loading~spin) Connecting... | - |
| Synthesizing | $(loading~spin) Synthesizing... | - |
| Streaming | $(loading~spin) Receiving... 45KB | - |
| Playing | $(unmute) Playing 35% | Stop |
| Paused | $(unmute) Paused | Stop |
A sleek panel opens with full playback controls:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Alex TTS Player [ร] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โถ๏ธ โน๏ธ โโโโโโโโโโโโโโโโโโโโโโ 1:23 / 4:56 โ
โ โ
โ ๐ โโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Features:
Choose Alex's voice before reading:
Ctrl+Shift+P โ "Alex: Read with Voice Selection"| Voice | Character | Best For |
|---|---|---|
| Default (GuyNeural) | Professional, clear | Technical docs, code review |
| Warm (ChristopherNeural) | Friendly, conversational | Tutorials, READMEs |
| British (RyanNeural) | Authoritative | Formal documents, presentations |
| Friendly (DavisNeural) | Casual, approachable | Chat logs, informal content |
Export any document to audio file:
Ctrl+Shift+P โ "Alex: Save as Audio"Use cases:
Multiple ways to stop playback:
$(unmute) icon during playback)Escape when readingCtrl+Shift+P โ "Alex: Stop Reading"Alex automatically strips markdown formatting for natural speech:
| You Write | Alex Reads |
|---|---|
# Heading | "Heading." (pause) |
**bold text** | "bold text" (slight emphasis) |
[link text](url) | "link text" |
`code` | "code" |
> blockquote | "Quote: ..." |
--- | (long pause) |
Symbol conversion:
| Symbol | Spoken As |
|---|---|
~5 minutes | "about 5 minutes" |
50% | "50 percent" |
A โ B | "A leads to B" |
ยฑ5% | "plus or minus 5 percent" |
This skill gives Alex a voice. Version 2.0 uses native TypeScript WebSocket integration with Microsoft Edge TTS, eliminating external dependencies. Reading documents aloud with natural-sounding neural voices.
Version 2.0 Changes:
Why promote to Master:
Dependencies (v2.0):
ws npm package (WebSocket client)Alex's voice synthesis capability using Microsoft Edge TTS via native TypeScript. Enables reading markdown documents, code files, and text aloud with natural-sounding voices. Fully integrated into the VS Code extension.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Alex VS Code Extension โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Commands: โ
โ โข Alex: Read Aloud (Ctrl+Alt+R) โ
โ โข Alex: Read with Voice Selection โ
โ โข Alex: Save as Audio โ
โ โข Alex: Stop Reading โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ttsService.ts โ โ
โ โ Native WebSocket to Edge TTS โ โ
โ โ โข SSML generation โ โ
โ โ โข Markdown stripping โ โ
โ โ โข Progress callbacks โ โ
โ โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ audioPlayer.ts โ โ
โ โ Webview-based playback โ โ
โ โ โข Cross-platform HTML5 Audio โ โ
โ โ โข Play/pause/stop controls โ โ
โ โ โข Progress tracking โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WebSocket (wss://)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Microsoft Edge TTS Endpoint โ
โ wss://speech.platform.bing.com/consumer/speech/... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข 400+ neural voices, 90+ languages โ
โ โข Free, no API key required โ
โ โข MP3 output (24kHz, 48kbps) โ
โ โข SSML support for prosody control โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
saymacOS ships 30+ built-in neural voices via the say command -- instant, offline, zero-cost. Use as a fallback when Edge TTS is unavailable (no internet, WebSocket blocked) or for quick, lightweight reads.
# Basic usage
say "Hello from Alex"
# Read a file aloud
say -f document.txt
# Save to audio file (AIFF)
say -o output.aiff "Synthesis complete"
# Save as AAC (smaller file)
say -o output.m4a --data-format=aac "Dream state finished"
# Choose a voice (the macOS "Alex" voice is a fun coincidence)
say -v Alex "I am Alex, reading your documentation"
# List all installed voices
say -v '?'
| Feature | Edge TTS (primary) | macOS say (fallback) |
|---|---|---|
| Quality | 400+ neural voices | 30+ system voices |
| Cost | Free | Free |
| Offline | No (WebSocket) | Yes |
| Output formats | MP3 | AIFF, AAC, WAV |
| Languages | 90+ | ~20 |
| Best for | Production audio, MP3 export | Quick reads, notifications |
Notification after long operations: Add say calls to signal completion of brain-qa, dream-state, or VSIX packaging:
# After brain-qa completes
node .github/muscles/brain-qa.cjs --mode quick && say "Brain QA complete"
# After dream state
node .github/muscles/brain-qa.cjs --mode all && say "Dream maintenance finished"
Note: The macOS "Alex" voice name is a happy coincidence -- it's one of the highest-quality built-in voices.
| Preset | Voice ID | Character |
|---|---|---|
| Default | en-US-GuyNeural | Professional male, clear articulation |
| Warm | en-US-ChristopherNeural | Friendly, conversational |
| British | en-GB-RyanNeural | British accent, authoritative |
| Friendly | en-US-DavisNeural | Casual, approachable |
Alex's default voice (GuyNeural) was chosen for:
Command: alex.readAloud
Keybinding: Ctrl+Alt+R (Windows/Linux), Cmd+Alt+R (macOS)
Reads the current selection or entire document aloud using Alex's default voice.
Behavior:
Command: alex.readWithVoice
Quick pick to select a voice preset before reading.
Command: alex.saveAsAudio
Generate and save speech to an MP3 file. Opens a save dialog for output location.
Command: alex.stopReading
Keybinding: Escape (when reading)
Immediately stops current playback.
| File | Purpose |
|---|---|
ttsService.ts | WebSocket connection, SSML generation, synthesis |
audioPlayer.ts | Webview panel, playback controls, system fallback |
index.ts | Module exports |
The prepareTextForSpeech() function strips markdown:
| Markdown | Speech Output |
|---|---|
# Heading | "Heading." (pause) |
**bold** | "bold" (emphasis via prosody) |
*italic* | "italic" |
`code` | "code" |
[link]\(url\) | "link" |
- item | "Item." |
> quote | "Quote: ..." |
--- | (long pause) |
```python
def hello():
print("Hello")
Becomes: "Python code block. Definition hello. Print hello. End code block."
### Symbol-to-Speech Transformations
Symbols are converted to natural speech equivalents:
| Symbol | Spoken As | Example |
|--------|-----------|--------|
| `~` | "approximately" or "about" | ~2 min โ "about 2 minutes" |
| `&` | "and" | A & B โ "A and B" |
| `@` | "at" | user@email โ "user at email" |
| `%` | "percent" | 50% โ "50 percent" |
| `+` | "plus" | +10% โ "plus 10 percent" |
| `โ` | "leads to" or "becomes" | A โ B โ "A becomes B" |
| `โ` | (pause) | wordโword โ "word (pause) word" |
| `#` | (context-dependent) | #1 โ "number 1"; ## โ (heading marker) |
| `<` / `>` | "less than" / "greater than" | x > 5 โ "x greater than 5" |
| `โฅ` / `โค` | "greater than or equal" / "less than or equal" | |
| `ยต` | "micro" | ยตg โ "microgram" |
| `ยฐ` | "degrees" | 37ยฐC โ "37 degrees celsius" |
| `ยฑ` | "plus or minus" | ยฑ5% โ "plus or minus 5 percent" |
### Time Duration Patterns (v2.1.0)
| Input | Spoken As |
|-------|----------|
| `4h` | "4 hours" |
| `30m` | "30 minutes" |
| `15s` | "15 seconds" |
| `2d` | "2 days" |
| `1w` | "1 week" |
| `90min` | "90 minutes" |
### Emoji Pronunciation (v2.1.0)
| Emoji | Spoken As | Context |
|-------|-----------|--------|
| โ
| "completed" | Status indicators |
| โ | "not done" | Status indicators |
| โ ๏ธ | "warning" | Alerts |
| ๐ | "planned" | Task status |
| ๐ | "in progress" | Task status |
| โณ | "waiting" | Task status |
| ๐ฅ | "hot" or "high priority" | When followed by "High" |
| ๐ | "unlocked" | Feature status |
| ๐ก | "idea" | Suggestions |
| ๐ | "new" | Version notes |
**Emoji-Text Deduplication**: When emoji meaning matches following text (e.g., `โ
Complete`), only says it once ("completed", not "completed Complete").
### Table Reading (v2.1.0)
Markdown tables are converted to natural speech:
```markdown
| Name | Status |
|-------|----------|
| Alice | โ
Done |
| Bob | ๐ Active |
Becomes: "Table with 2 columns: Name, Status. Row 1: Name is Alice. Status is completed. Row 2: Name is Bob. Status is in progress."
Versions are spoken naturally with context awareness:
| Input | Spoken As | Why |
|---|---|---|
v4.2.9 | "version 4.2.9" | Standalone version |
Version: v4.2.9 | "Version: 4.2.9" | Already has "Version:" prefix |
Uses negative lookbehind to prevent redundant "version version".
Design Principle: Would a human reading this aloud say the symbol name, or translate it to meaning? Almost always the latter.
Edge TTS has undocumented size limits per WebSocket request. Documents over ~3000 characters (approximately 7 minutes of audio) can cause the connection to stall indefinitely, appearing to hang at "Synthesizing..." with no progress.
Chunking Strategy:
| Setting | Value | Rationale |
|---|---|---|
MAX_CHUNK_CHARS | 3000 | Safe limit before Edge TTS stalls |
CHUNK_TIMEOUT_MS | 60000 | 60 seconds per chunk |
MAX_RETRIES | 3 | Retry failed chunks |
Chunk Splitting Logic:
\n\n) first. or ! or ? )Synthesizing speech [n/N]...Retry with Exponential Backoff:
| Attempt | Delay | Formula |
|---|---|---|
| 1 | ~1s | 1000 + jitter |
| 2 | ~2s | 2000 + jitter |
| 3 | ~4s | 4000 + jitter |
Jitter (0-500ms random) prevents thundering herd on concurrent requests.
For documents over 5 minutes (~750 words), Alex offers to summarize before reading:
This document is approximately 32 minutes long (~4800 words).
Would you like to:
- Read full content (~32 min)
- Summarize for speech (~3 min) โ Recommended
Summarization uses the VS Code Language Model API (GPT-4o preferred) with a target of ~450 words (~3 minutes).
Bluetooth and USB speakers often need time to "wake up" from power-saving mode. A 2-second delay before playback starts ensures the first words aren't clipped:
const SPEAKER_WARMUP_MS = 2000;
// Status shows "Preparing speakers..." during delay
TTS v2 is built into the Alex VS Code extension. No separate installation required.
The extension automatically includes:
ws (WebSocket client for Edge TTS connection)fs-extra (file operations for audio saving)After extension update, verify TTS works:
Ctrl+Alt+R (Windows/Linux) or Cmd+Alt+R (macOS)Press Ctrl+Alt+R to read document aloud
Select text first to read only selection
Command Palette โ "Alex: Save as Audio"
Choose output location โ MP3 saved
Command Palette โ "Alex: Read with Voice Selection"
Choose: Default | Warm | British | Friendly