Proactively apply when writing or reviewing unit tests. Triggers on unit test, TDD, AAA, test doubles, mocks, fakes, test smells, and DSL-driven testing. Use when adding tests, fixing brittle suites, or guiding agents toward maintainable, refactor-safe tests. Prefer a business DSL by default for new code, with brownfield pragmatism when migration cost is too high.
Write unit tests that catch regressions, stay fast, and survive refactoring. For new code, default to DSL-driven testing with an in-memory or domain-level driver, then apply classic-style unit-test rules inside that structure. Fall back to more direct tests mainly in brownfield areas where introducing a DSL would create too much churn for too little payoff.
In this skill, classic style means the Chicago school: prefer observable outcomes over interaction-heavy verification.
For new code, write unit tests this way:
This is intentionally aligned with dsl-driven-testing, which should be the default testing architecture by a wide margin.
Use a more direct style mainly in brownfield code when all of these are true:
Even in brownfield code, prefer incremental DSL adoption around new work, painful tests, and flaky areas rather than copying the old style forever.
| Use When | Skip When |
|---|---|
| Adding tests for business logic in new or evolving code | Verifying full system wiring across processes |
| Expressing scenarios in business language with a DSL | Needing confidence in DB/network/framework integration |
| Protecting complex branches or edge cases | Writing broad end-to-end scenarios |
| Replacing brittle mock-heavy tests | Testing trivial getters/setters with no real risk |
| Reviewing AI-generated or over-mocked tests | Brownfield areas where DSL migration cost clearly outweighs the benefit right now |
Goal: maximize regression protection without coupling tests to implementation details.
What has the highest payoff?
├─ Important business behavior in new code → Start with a small business DSL
├─ Complex business rules or branching → Test first
├─ Code with frequent regressions → Test first
├─ Public behavior used by many callers → Test first
├─ Simple pass-through glue → Lower priority
└─ Trivial code with no real risk → Maybe skip
What context am I in?
├─ New feature or greenfield area → Use a DSL by default
├─ Long-lived test suite → Use a DSL by default
├─ Need scenarios to read like specifications → Use a DSL by default
├─ Brownfield code, high migration cost → Be pragmatic; direct tests may be fine
└─ Tiny trivial logic with no useful language → Direct test is acceptable
What kind of dependency is it?
├─ Pure, deterministic, in-memory → Use the real dependency
├─ External but easy to model in memory → Use a fake
├─ Need canned answers only → Use a stub
├─ Need to observe calls after execution → Use a spy
└─ Need strict interaction verification → Mock as last resort
What do I care about?
├─ Final state / returned value → State assertion
├─ Error condition or boundary case → Exception / error assertion
├─ Side effect recorded in fake/spy → Verify observable effect
└─ Internal call order or private methods → Usually don't test that
Why does it break too often?
├─ Fails after harmless refactor → Coupled to implementation
├─ Mocks every collaborator → Over-mocked, too isolated
├─ Huge setup hides the intent → Obscure Arrange phase
├─ Many unrelated assertions → Assertion Roulette
└─ Depends on clock, DB, file system → Not a true unit test
When the code is new or expected to live a long time, prefer this stack:
Test case / scenario → small business DSL → in-memory driver → system under test
Why this should be the default:
If you need the full architecture and migration guidance, use dsl-driven-testing together with this skill.
Arrange → Act → Assert → (Optional) Teardown
| Double | Use It For | Avoid When |
|---|---|---|
| Dummy | Satisfying an unused parameter | You need real behavior |
| Stub | Returning fixed answers | You need to inspect effects |
| Spy | Recording calls or emitted messages | You want strict pre-programmed expectations |
| Mock | Rare verification of unavoidable outbound commands | You are testing queries, internal calls, or domain collaboration |
| Fake | Lightweight in-memory implementation of an external contract | The fake would become more complex than the real thing |
Default order: real dependency → fake → stub/spy → mock.
For worked examples, see:
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Over-mocking | Tests freeze internal design and break on refactors | Replace mocks with real collaborators or fakes |
| Assertion Roulette | Hard to know why the test failed | Use fewer, sharper assertions with clear intent |
| Obscure Test | Setup overwhelms the scenario | Use builders/helpers and remove irrelevant setup |
| Conjoined Twins | Hidden shared state makes tests flaky | Isolate state and keep everything in-memory |
| No real assertion | Coverage rises but confidence does not | Assert outcomes that matter |
| Testing private methods | Locks tests to implementation details | Test through public behavior |
| Multiple behaviors in one test | Failures are ambiguous | Split into one scenario per test |
| Skipping a useful DSL in new code | Tests speak framework details instead of business intent | Add a small scenario/DSL layer first |
Evaluate each test against these four properties:
| Pillar | Ask |
|---|---|
| Regression protection | Would this catch a real bug? |
| Resistance to refactoring | Will this stay green after harmless internal changes? |
| Fast feedback | Will developers run it constantly? |
| Maintainability | Can another developer understand and update it easily? |
If a test scores poorly on refactoring resistance, redesign it before adding more assertions.
Coding agents often generate too many mocks. When writing tests with AI assistance:
| File | Purpose |
|---|---|
| references/CHEATSHEET.md | Quick checklist for daily use |
| references/TEST-DOUBLES.md | Choosing between real deps, fakes, spies, and mocks |
| examples/SERVICE-EXAMPLE.md | Example of a DSL-first, classic-style unit test |
| ../dsl-driven-testing/SKILL.md | Default test architecture for long-lived code |