Use this skill to add a new TTS engine to Voicebox. It walks through dependency research, backend implementation, frontend wiring, PyInstaller bundling, and frozen-build testing. Always start with Phase 0 (dependency audit) before writing any code.
Integrate a new text-to-speech engine into Voicebox end-to-end: dependency research, backend protocol implementation, frontend UI wiring, PyInstaller bundling, and frozen-build verification. The user should only need to test the final build locally.
The full phased guide lives at docs/content/docs/developer/tts-engines.mdx. Read this file in its entirety before starting. It contains:
TTSBackend protocol)requirements.txt, justfile, CI, Docker)build_binary.py + server.py)# Read the full TTS engines doc
cat docs/content/docs/developer/tts-engines.mdx
Internalize all phases, especially Phase 0 and Phase 5. The v0.2.3 release was three patch releases because Phase 0 was skipped.
Clone the model library into a temporary directory and audit it. Do NOT skip this.
mkdir /tmp/engine-research && cd /tmp/engine-research
git clone <model-library-url>
Run the grep searches from Phase 0.2 in the guide against the cloned source and its transitive dependencies. Produce a written dependency audit covering:
--collect-all, --copy-metadata, --hidden-import)torch.load, float64, MPS, HF token)from_pretrained vs snapshot_download + from_local)Test model loading and generation on CPU in the throwaway venv before proceeding.
Follow the guide's phases in order. Key files to modify:
Backend (Phase 1):
backend/backends/<engine>_backend.pybackend/backends/__init__.py (ModelConfig + TTS_ENGINES + factory)backend/models.pyFrontend (Phase 3):
app/src/lib/api/types.ts — engine union typeapp/src/lib/constants/languages.ts — ENGINE_LANGUAGESapp/src/components/Generation/EngineModelSelector.tsx — ENGINE_OPTIONS, ENGINE_DESCRIPTIONSapp/src/lib/hooks/useGenerationForm.ts — Zod schema, model-name mappingapp/src/components/ServerSettings/ModelManagement.tsx — MODEL_DESCRIPTIONSDependencies (Phase 4):
backend/requirements.txtjustfile (setup-python, setup-python-release targets).github/workflows/release.ymlDockerfile (if applicable)Register the engine in backend/build_binary.py:
--hidden-import for the backend module and model package--collect-all for packages using inspect.getsource, shipping data files, or native libraries--copy-metadata for packages using importlib.metadataIf the engine has native data paths, add os.environ.setdefault() in backend/server.py inside the if getattr(sys, 'frozen', False): block.
just dev
Test the full chain: model download → load → generate → voice cloning.
Walk through the Implementation Checklist at the bottom of tts-engines.mdx. Every item must be checked before handing the build to the user.
These are the most common failure modes. Phase 0 research catches all of them:
| Pattern | Symptom in Frozen Build | Fix |
|---|---|---|
@typechecked / inspect.getsource() | "could not get source code" | --collect-all <package> |
| Package ships pretrained model files | FileNotFoundError for .pth.tar, .yaml | --collect-all <package> |
| C library with hardcoded system paths | FileNotFoundError for /usr/share/... | --collect-all + env var in server.py |
importlib.metadata.version() | "No package metadata found" | --copy-metadata <package> |
torch.load without map_location | CUDA device not available on CPU build | Monkey-patch torch.load |
torch.from_numpy on float64 data | dtype mismatch RuntimeError | Cast to .float() |
token=True in HF download calls | Auth failure without stored HF token | Use snapshot_download(token=None) + from_local() |
main.py requires zero changes.backends/__init__.py handles all dispatch automatically.get_torch_device() and model_load_progress() from backends/base.py — don't reimplement device detection or progress tracking.