Hypothesis-Driven Development — treat ADRs as the source of truth, not code. Form testable hypotheses, stage ADRs, implement to test them, validate against predictions, then accept or reject based on evidence. Prevents repeated mistakes by documenting failures. Enables agent-to-agent collaboration through explicit architectural reasoning. This is the canonical HDD entry point — it replaces the thin router at .claude/commands/hdd/hdd.md.
Run Hypothesis-Driven Development for: $ARGUMENTS
Code is an implementation artifact. ADRs are the source of truth.
Before writing any code, form testable hypotheses about what you expect to happen, document them in a staging ADR, implement the minimum needed to test them, validate whether reality matched predictions, then accept or reject based on evidence.
Parse $ARGUMENTS for:
adr_number (required): The ADR number, e.g. 008title (required): Brief title, e.g. "Task Endpoint Implementation"--resume (optional): Resume an existing HDD session instead of starting fresh--phases=N-M (optional): Run only phases N through M. Used by /workflow to invoke
Phases 1-2 (research + stage docs) as its Stage 2, then take back control for its own
planning/implementation/review stages. Default: 1-5 (full lifecycle).Research --> Stage Docs --> Implement --> Validate --> Decide
| checkpoint | checkpoint | | checkpoint | checkpoint
v v v v v
hypotheses spec + ADR code + tests evidence accept/reject
Each phase has a user checkpoint. You do NOT skip checkpoints.
When /workflow dispatches to /hdd at Stage 2, it passes --phases=1-2. This produces
the spec and staging ADR, then returns control to /workflow for its own planning,
implementation, and review stages. When invoked standalone (the default), /hdd runs
all 5 phases as a self-contained lifecycle.
Check for existing HDD sessions:
bd list --label=hdd --type=epic --json
If an active session exists for this ADR number, warn and ask whether to resume or start fresh.
Create an epic and phase tasks in beads:
EPIC_ID=$(bd create --title="ADR-${adr_number}: ${title}" \
--description="HDD session — hypotheses to be defined during research" \
--type=epic --priority=2 --json | jq -r '.id')
Create phase tasks with dependencies (Phase 2 depends on Phase 1, etc.):
If --phases=1-2, only create tasks for Phases 1 and 2.
Write state to .hdd/state.json:
{
"workflow": "hdd",
"version": "2.1",
"adr_number": "<number>",
"title": "<title>",
"phase": "research",
"phases_requested": [1, 2, 3, 4, 5],
"epicId": "<bead-id>",
"phaseIds": {
"research": "<id>",
"stage": "<id>",
"implement": "<id>",
"validate": "<id>",
"decide": "<id>"
},
"status": "in_progress",
"artifacts": [],
"hypotheses": [],
"staging_adr_path": null,
"spec_path": null,
"open_risks": [],
"reconciliation_flags": [],
"updated_at": "<ISO timestamp>"
}
Show the session dashboard and begin Phase 1.
.hdd/state.jsonGoal: Understand the problem space and form testable hypotheses.
Dispatch one Explore-type sub-agent using the Agent tool:
Agent tool call:
subagent_type: Explore
prompt: |
You are the HDD Research agent for ADR-${adr_number}: ${title}.
## Task
Build context and form testable hypotheses for this architectural decision.
## What to read
- Accepted ADRs: .adr/accepted/ (constraints and patterns)
Also check docs/adr/ — legacy ADRs predate the .adr/ convention
- Rejected ADRs: .adr/rejected/ (prior failure modes)
Also check docs/adr/rejected/ — legacy location
- Active staging: .adr/staging/ (in-flight work)
- Specs: specs/ (implementation contracts)
## ADR Reconciliation
For each accepted ADR related to this domain, apply these 7 signals:
1. Context mismatch — ADR describes codebase state that no longer matches
2. Broken references — files/modules mentioned have moved or been deleted
3. Dependency drift — major version change in a dependency the ADR relied on
4. Contradicting direction — this new work conflicts with the ADR's decision
5. Unrealized consequences — ADR predicted outcomes that never materialized
6. Rejected alternative resurfacing — a rejected approach is being proposed again
7. Orphaned dependency chain — ADR depends on a retired/rejected ADR
If any signal fires, include it in your output with specific evidence.
## Hypotheses
Form SOFT hypotheses — Specific, Observable, Falsifiable, Testable.
Each hypothesis must name an exact behavior, metric, or outcome.
## Return format
```
RESEARCH SUMMARY
================
Domain: <what area of the codebase this touches>
EXISTING CONSTRAINTS
- <constraint from accepted ADR-NNN>: <what it means for this work>
PRIOR FAILURES
- <lesson from rejected ADR-NNN>: <what to avoid>
RECONCILIATION FLAGS
- ADR-NNN: <signal name> — <specific evidence>
Recommended disposition: STILL VALID | NEEDS AMENDMENT | SUPERSEDED | INVALIDATED
(or: No reconciliation flags.)
HYPOTHESES
H1: <specific testable claim>
Prediction: <exact observable outcome>
Validation: <how to test>
H2: ...
UNKNOWNS
- <anything that blocks staging>
OPEN RISKS
- <anything that could derail implementation>
```
Checkpoint: Present the research summary and draft hypotheses to the user. Wait for approval before advancing.
Gate: At least one SOFT hypothesis with evidence-backed context. User approved.
When a reconciliation signal fires, the research agent recommends one of four outcomes. The orchestrator presents these to the user during the checkpoint for a decision:
.adr/retired/
with a forward reference during Phase 5..adr/retired/
with an explanation. If the domain still needs a decision, the current HDD session serves
as that replacement.Do NOT act on dispositions during Phase 1. Record them. Execute them in Phase 5 alongside the primary ADR migration.
Goal: Document WHAT we're implementing (spec) and WHY (ADR) before writing code.
Dispatch one sub-agent:
Agent tool call:
prompt: |
You are the HDD Staging agent for ADR-${adr_number}: ${title}.
## Context
Research summary (from Phase 1):
<paste the full research output here>
Approved hypotheses:
<paste the user-approved hypotheses>
## Task
Create two documents:
### 1. Spec (WHAT)
Path: specs/${slug}.md
The spec describes the implementation contract. It answers: what does the
system look like after this work is done? Include:
- Data structures / types being added or changed
- API surface (new endpoints, tool operations, parameters)
- Behavior descriptions (what happens when X)
- Acceptance criteria (testable conditions for "done")
Do NOT include reasoning about WHY — that belongs in the ADR.
### 2. Staging ADR (WHY)
Path: .adr/staging/ADR-${adr_number}-${slug}.md
Sections:
- **Status**: Proposed
- **Context**: Problem, current state, constraints from research
- **Decision**: Chosen approach and why (reference alternatives considered)
- **Consequences**: Positive outcomes, tradeoffs, follow-ups
- **Hypotheses**: Each in SOFT format with validation plan (use format below)
- **Spec**: Link to specs/${slug}.md
- **Links**: Related ADRs, files, rejected approaches
### Hypothesis format in ADR
```markdown
### Hypothesis N: [Specific testable claim]
**Prediction**: [Exact observable outcome]
**Validation**: [Commands, observations, or measurements]
**Outcome**: PENDING
```
### Dependency/conflict notes
If any reconciliation flags were raised in Phase 1, note them in the ADR's
Context section with planned dispositions.
## Return format
```
STAGING SUMMARY
===============
Spec path: specs/${slug}.md
ADR path: .adr/staging/ADR-${adr_number}-${slug}.md
Hypotheses staged: N
Dependencies noted: [list or "none"]
Conflicts noted: [list or "none"]
```
After receiving the sub-agent output, update .hdd/state.json with staging_adr_path
and spec_path.
Checkpoint: Present both the spec and the staging ADR to the user. Wait for approval before implementation.
Gate: Spec exists in specs/. Staging ADR exists in .adr/staging/. All sections
complete. Each hypothesis is SOFT. User approved both documents.
If --phases=1-2: Stop here. Return the spec path, ADR path, and hypotheses to the
calling workflow. Update state to phase: "stage-complete".
Goal: Build the minimum code needed to test hypotheses.
Dispatch one or more sub-agents depending on implementation scope. For each sub-agent:
Agent tool call:
prompt: |
You are an HDD Implementation agent.
## Scope
ADR: <paste staging ADR path and content>
Spec: <paste spec path and content>
## Task
Implement the minimum changes required to exercise the hypotheses in the ADR.
Rules:
1. Read the spec for WHAT to build. Read the ADR for WHY.
2. Write tests tied to each hypothesis — tests validate the PREDICTION,
not just that code exists.
3. Run build and type checks after implementation.
4. Track any deviation from spec with justification.
5. Do NOT commit. Changes stay uncommitted until after validation.
## Return format
```
IMPLEMENTATION SUMMARY
======================
Files modified: [list with paths]
Files created: [list with paths]
Tests written: N
Tests passing: N/N
Build: pass | fail
Type check: pass | fail
HYPOTHESIS COVERAGE
H1: covered by test <test name>
H2: covered by test <test name>
...
DEVIATIONS FROM SPEC
- <deviation>: <justification>
(or: None.)
TEST TARGETS
Command: <exact test command>
Files: <test file paths>
```
After receiving summaries, update .hdd/state.json with artifacts and test targets.
No user checkpoint — advance directly to validation.
Gate: Implementation and hypothesis-linked tests exist. Build passes. Type checks pass.
Goal: Test whether predictions match reality.
Dispatch one sub-agent:
Agent tool call:
prompt: |
You are an HDD Validation agent.
## Context
ADR: <staging ADR path and content>
Test targets: <from implementation summary>
## Task
For each hypothesis in the ADR:
1. Run the specific validation described in the hypothesis
2. Capture concrete evidence (test output, command output, observations)
3. Classify: VALIDATED | INVALIDATED | INCONCLUSIVE
Also run full quality checks:
- Build: npm run build (or equivalent)
- Tests: npm test (or equivalent)
- Linter: if configured
- Type checker: tsc --noEmit (or equivalent)
## Return format
```
VALIDATION REPORT
=================
Quality checks: build [pass|fail], tests [pass|fail], types [pass|fail]
HYPOTHESIS RESULTS
H1: "<claim>"
Prediction: <what we expected>
Evidence: <what actually happened — include command output>
Classification: VALIDATED | INVALIDATED | INCONCLUSIVE
Notes: <analysis of match/mismatch>
H2: ...
MANUAL VERIFICATION NEEDED
- <hypothesis or aspect that requires human testing>
(or: None — all hypotheses verified automatically.)
OVERALL: ALL VALIDATED | SOME INVALIDATED | MIXED
```
Checkpoint: Present the validation report to the user. For each hypothesis, show prediction vs. evidence. Ask the user to:
Gate: Every hypothesis has an explicit outcome. User confirmed results.
Goal: Accept or reject the ADR based on validation evidence.
mv .adr/staging/ADR-NNN-*.md .adr/accepted/.adr/retired/ with forward reference.adr/retired/ with explanationspecs/)feat(<scope>): <description>
ADR-NNN accepted — all hypotheses validated.
.hdd/state.jsonCheckpoint: Present the rejection analysis. The user MUST approve the rejection.
mv .adr/staging/ADR-NNN-*.md .adr/rejected/trash specs/${slug}.md (it describes something we didn't build)git checkout -- <files>docs(adr): reject ADR-NNN — [hypothesis] invalidated
[Brief explanation of what was learned]
.hdd/state.jsonIf some hypotheses validated and others didn't:
Render after every phase transition:
HDD SESSION: ADR-<number> — <title>
Epic: <epicId> | Branch: <branch>
Phases: <1-5 or 1-2>
Phase Status Artifact
----- ------ --------
1. Research [status] hypotheses: N, flags: N
2. Stage Docs [status] spec: specs/<slug>.md
adr: .adr/staging/ADR-NNN-*.md
3. Implementation [status] files: N, tests: N
4. Validation [status] validated: N, invalidated: N
5. Decision [status] accepted | rejected | pending
Open risks: <count>
Reconciliation: <count> flags, <count> pending dispositions
Status symbols: [ ] pending, [~] in progress, [x] complete, [!] blocked
Before leaving Phase 1, every hypothesis MUST pass the SOFT check:
Reject hypotheses that are vague ("performance is good"), implementation-focused ("we will use class X"), or unfalsifiable ("code will be maintainable").
.hdd/state.json and beads after each phase completes. Crashes should be recoverable./hdd command router at .claude/commands/hdd/hdd.md is deprecated. The module briefs in .claude/commands/hdd/modules/ remain as reference material for sub-agent context.Always use for: New features, architectural changes, protocol implementations, refactoring that changes behavior, performance optimizations with measurable goals.
Skip for: Trivial bug fixes, documentation-only changes, test-only additions, formatting/linting fixes.
.adr/
staging/ # Work in progress — under validation
accepted/ # Validated — source of truth
rejected/ # Failed — documented lessons
retired/ # Superseded — historical reference
.hdd/
state.json # Current session state (gitignored)
specs/ # Implementation specs (WHAT — paired with ADRs)
docs/WORKFLOW-MASTER-DESCRIPTION.md.claude/commands/hdd/modules/ (sub-agent reference material).claude/commands/hdd/state.mddocs/adr/000-template.md (if it exists)