Use when starting a new feature, writing a spec, or brainstorming architecture and you want to capture richer intent than typing allows. Speak your thoughts into a transcription tool, then feed the raw transcript to Claude Code for structuring.
Use speech-to-text tools to dictate your initial specs, feature ideas, and architectural thoughts instead of typing them. Speaking naturally produces longer, more context-rich input because people self-edit heavily when typing but ramble freely when talking. That rambling is gold — LLMs are excellent at extracting structured meaning from unstructured speech.
Core principle: Speaking freely captures intent that is hard to express in typed text. Don't self-edit — ramble and let the LLM find the structure.
Dependency: A speech-to-text tool (WhisperFlow, macOS Dictation, or similar).
| Mistake | Why it's wrong |
|---|---|
| Editing yourself while speaking | The whole point is to capture raw, unfiltered intent. Self-editing while speaking defeats the purpose — you lose the same context you lose when typing. Just talk. |
| Skipping the transcription review | Speech-to-text makes errors. Quickly scan the transcript for mangled names, technical terms, or homophones before pasting it in. A 10-second scan prevents confused output. |
| Using voice during implementation | Voice shines during planning and ideation. Once you are writing code, typed instructions are more precise. Don't force voice where typing is better. |
| Pasting the transcript without a framing prompt | Claude Code needs to know what to do with the wall of text. Always prepend a short instruction like "Structure this into a feature spec" or "Extract the requirements from this transcript." |
| Speaking in short, clipped sentences | You are not typing. Speak in full, natural paragraphs. Explain the why, the context, the constraints, the edge cases. Longer is better — the LLM will compress it. |
Pick one and install it:
| Tool | Platform | Notes |
|---|---|---|
| Wispr Flow | macOS, Windows, iOS, Android | AI-powered voice-to-text with auto-editing. 4x faster than typing. Recommended. |
| macOS Dictation | macOS | Built-in. Press Fn Fn (or Globe key twice) in any text field. Good enough for most use cases. |
| Superwhisper | macOS | Polished Whisper app with hotkey activation. |
| Windows Voice Typing | Windows | Press Win+H. Built-in, decent quality. |
| Google Docs Voice Typing | Browser | Tools > Voice typing. Works well, requires Chrome. |
Any tool that converts speech to editable text works. The key requirement is that you can paste the output into a terminal or editor.
Open your transcription tool and start talking. Cover:
Do not organize your thoughts first. Do not outline. Just talk. Aim for 1-3 minutes of continuous speech. This typically produces 200-500 words of transcript, which is far more context than most people would type.
Speech-to-text tools mangle technical terms. Before pasting, scan for:
Fix only the obviously wrong terms. Do not rewrite — the raw, spoken style is the point.
Tip: Use TMUX with VIM keybindings to quickly jump through the transcript and fix transcription errors before pasting. VIM's word-motion keys (w, b, cw) make surgical fixes fast.
Prepend a one-line instruction that tells Claude Code what to produce:
Structure this spoken transcript into a feature spec with requirements,
constraints, and open questions:
[paste transcript here]
Other useful framing prompts:
Claude Code will return a well-organized version of your rambling thoughts. Now you can:
This is where the real leverage appears: you went from a blank prompt to a structured spec in under 5 minutes, and it contains context you never would have typed.
| Item | Details |
|---|---|
| Recommended tool (macOS) | WhisperFlow or macOS Dictation (Fn Fn) |
| Recommended tool (Windows) | Windows Voice Typing (Win+H) |
| Ideal speaking length | 1-3 minutes (~200-500 words of transcript) |
| Best stage to use | Planning, ideation, spec writing |
| Worst stage to use | Active implementation, precise code edits |
| Transcript editing | TMUX + VIM mode for quick surgical fixes |
| Key framing prompt | "Structure this spoken transcript into a feature spec with requirements, constraints, and open questions:" |
Based on Josh's technique from the Coding Agents: AI Driven Dev Conference. Josh uses WhisperFlow to speak initial specs, noting that speaking gives more context than typing because people naturally self-edit when they type. He also recommends TMUX with VIM mode for quickly fixing transcription errors before feeding the text to an LLM.