Guides implementation of any feature or fix using test-driven development; use when writing new production code, fixing bugs, or changing existing behaviour.
Before starting, ask if not already clear:
Output style:
rejects empty emailtest_submitFormInput: A feature requirement, bug report, or behaviour change, with optional context about the existing codebase and test infrastructure
Output: A test written to the failing state, then minimal production code to pass it, then a refactored result — each step verified before proceeding
Composability: Use alongside code-review (to audit implementation quality after the cycle completes); use swe for structural design decisions that arise during refactoring; use technical-writer to document the behaviour under test
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write production code before the test? Delete it. Start over.
No exceptions:
The cycle has five mandatory steps. Skipping any step invalidates the cycle.
Write one test that specifies the next unit of required behaviour.
Requirements for the test:
If you cannot state the expected result before writing the test, the oracle is missing. Resolve the ambiguity first.
Good:
"retries a failed operation exactly three times before propagating the error"
Bad:
"retry works" — vague; "tests the retry mock" — tests mock behaviour, not code behaviour
Run only the new test before writing any production code.
Confirm all three:
Test passes immediately? You are testing existing behaviour or the test is wrong. Fix or discard the test. Do not proceed.
Test errors? Fix the error until the test reaches a genuine failure. A test that errors is not a failing test.
Never proceed to Step 3 without completing Step 2.
Write the simplest code that makes the failing test pass.
Rules:
Run the full test suite.
Confirm both:
New test fails? Fix the production code. Do not weaken the test.
Previously passing test now fails? Fix the regression before proceeding. Regressions are not acceptable collateral.
With all tests green, improve the code's internal quality.
Permitted in this step:
Not permitted in this step:
Run the full suite after each refactoring change. If any test fails, the refactoring introduced a defect — revert and try again.
Return to Step 1. Write the next failing test for the next unit of behaviour.
Structure the test suite as a pyramid: many fast, focused unit tests at the base; fewer integration tests in the middle; a small number of system tests at the top. Invert this ratio and the suite becomes slow, fragile, and hard to diagnose.
When a unit test requires extensive setup or mocking, treat this as a design signal: the unit is over-coupled. Simplify the design before writing the test.
Many E2E tests, few unit tests — the "ice cream cone" shape. Symptoms:
If the test suite has this shape, add unit and integration tests for any code changed, and do not add new system tests until the lower levels provide adequate coverage.
These principles, grounded in systematic testing research by Bertrand Meyer (ETH Zürich), guide the selection and construction of test cases.
Every test requires an oracle: a rule for determining whether the output is correct. State the expected result explicitly before running the test. An oracle that depends on reading the implementation is circular and provides no verification.
Sources of strong oracles:
Divide the input domain into equivalence classes — sets of inputs expected to trigger the same behaviour. Select at least one test per class. Tests within a class are redundant; tests across classes are not.
Required partitions to cover for any input:
Each test must be executable in isolation and in any order. A test that depends on state left by another test is not a test — it is a script fragment. Use setup and teardown to create fresh state for every test.
A test suite that cannot detect simple mutations in the production code is not providing verification. After writing a test, mentally apply one mutation at a time to the code under test (change a < to <=, negate a condition, remove a branch) and confirm the test would fail. If a mutation survives all tests, the test suite has a gap — add a test that catches it.
Two tests that test the same behaviour under the same conditions are redundant. Remove or combine redundant tests; their maintenance cost is real and their verification value is zero.
| Property | Required | Disqualifying |
|---|---|---|
| Oracle | Explicit expected result stated before running | "It should work" or assertion-free |
| Isolation | No shared mutable state with other tests | Depends on execution order |
| Focus | Tests one behaviour | "and" in the test name |
| Naming | Describes the behaviour | Describes the method or uses a number |
| Realism | Uses real code for the unit under test | Tests mock behaviour instead of real behaviour |
| Minimality | Minimal setup to exercise the behaviour | Requires a full system fixture for a unit concern |
| Rationalization | Response |
|---|---|
| "Too simple to test" | Simple code breaks. The test takes less than a minute. |
| "I'll write tests after" | Tests written after pass immediately. A passing test that was never red proves nothing. |
| "I already manually tested it" | Manual testing is ad-hoc, unrepeatable, and leaves no record. It is not equivalent to an automated test. |
| "Tests after achieve the same goals" | Tests-after answer "what does this do?" Tests-first answer "what should this do?" They are not the same question. |
| "Deleting X hours of work is wasteful" | Sunk cost. Keeping unverified code incurs ongoing debt. Rewrite with TDD. |
| "Keep it as reference, write tests first" | You will adapt it. That is testing after. Delete means delete. |
| "This is too hard to test" | Hard to test means hard to use. Simplify the design. |
| "TDD slows me down" | Debugging production defects is slower. TDD moves the cost forward. |
Stop and restart from Step 1 (Red) if any of the following are true:
Before marking work complete:
Cannot check all boxes? Return to the first unchecked item and resolve it.