Use when investigating bugs, crashes, assertions, or unexpected behavior - requires writing a reproducing test early instead of over-analyzing source code; concrete experiments over mental models
When investigating a bug, write a reproducing test as early as possible. Analysis without experimentation spirals into circular reasoning.
Core principle: A 20-line test that fails tells you more than 2 hours of reading source code.
READ code → BUILD mental model → READ more code → REVISE model → READ more code →
SECOND-GUESS model → READ same code again → ... → eventually write a test
This feels productive but isn't. You're pattern-matching on code without grounding in reality. Each re-read adds uncertainty, not clarity.
1. READ the error message / assertion / crash
2. IDENTIFY the minimal trigger (what operation, on what object, in what state?)
3. WRITE a test that sets up that state and performs that operation
4. RUN it
5. Let the RESULT guide the next step
6. Iterate as needed, but always grounded in test results backed by code, not just code.
Steps 1-2 should take minutes, not hours. You don't need to understand the full call chain to write a test. You need to know what the entry point is and what went wrong.
Before writing the test you need to know three things: what object/API to exercise, what test file to put it in, and how to build/run it. Spend up to 20-30 tool calls finding these. This is bounded research in service of the test -- not open-ended code analysis. If you don't know the exact API, pick the closest thing you can find and write the test anyway. A test that exercises the wrong API and passes still tells you something.
Early in the investigation, check the recent commit history for changes that touched the relevant code. A bug that appeared recently was likely introduced recently. git log --oneline -20 -- path/to/relevant/files takes seconds and can immediately narrow your search from "something somewhere is wrong" to "this specific change might be the cause."
This is not a substitute for writing a test -- it's a way to form a better hypothesis faster. If you can identify a suspect commit, your first test should try to confirm whether that change is responsible.
When you find a suspect commit: Don't stop there. Always check at least 10 commits in either direction around it. Bugs are often introduced by the interaction between multiple changes, not a single commit. A commit that looks innocent in isolation may have broken an assumption that a nearby commit relied on. Looking at the surrounding commits also guards against confirmation bias -- you might find a better explanation in an adjacent change.
Don't mistake correlation for causation. A commit that lines up timewise with when the bug appeared is a lead, not a conclusion. It might be a coincidence -- the real cause could be an environmental change, a dependency update, a race condition that only became likely under new load patterns, or a latent bug exposed by an unrelated change. Treat a suspect commit as a hypothesis to test, not evidence of guilt. If you can't demonstrate the mechanism by which the commit causes the bug, you haven't found the cause.
What to look for:
Keep it bounded: This is around 5 tool calls of git log and git show, not an archaeology expedition. If nothing jumps out, move on and write the test with what you have. The commit history is one input to hypothesis formation, not a prerequisite for it.
Ask yourself:
Before writing a test from scratch, find existing tests for the same feature or subsystem. Search for tests that exercise the API, protocol, or code path you're investigating — not just tests in the same file as the crash site.
Existing tests encode implicit knowledge you won't get from reading source code: required setup, framework-specific verification patterns, config quirks, shutdown handling. A test that's structurally wrong (missing verification, wrong config) will "pass" silently without exercising anything.
Adapt an existing working test rather than inventing one. If an existing test works for the feature and the bug is gated by a flag, autogate, or config change, the fastest reproduction is often to clone the working test and change the single variable.
A test that passes is not evidence the feature worked. It may mean the feature never ran.
After every test run, check the output for evidence the specific code path was exercised — log lines mentioning the feature, subrequests being made, expected error messages, metrics being recorded. If you can't find evidence in the test output, the test is not valid regardless of its exit status.
Concrete checks:
You don't need to reproduce the exact production scenario. You need to reproduce the mechanism.
Production crash: KJ_REQUIRE(!writeInProgress, "concurrent write()s not allowed")
You DON'T need: Every detail of the production call stack that potentially leads to this.
You DO need: To find the shortest path to trigger it, even if only hypothetically.
Good test scope:
Create the pipe/adapter/object directly
→ Put it in the suspect state
→ Perform the operation that should fail
→ Assert what happens
Bad test scope:
Reproduce the entire production call stack
with all middleware and wrappers
A hypothesis for a bug investigation is:
"If I do X after Y fails, Z will happen because the cleanup in Y's error path doesn't do W."
It does NOT need to be:
"I have traced every code path and am certain that line 847 is the root cause because of the interaction between..."
The first version is testable in minutes. The second takes hours to construct and might still be wrong.
In this codebase, a C++ compile-and-test cycle can take minutes. This does not justify delaying the test in favor of more code reading. It changes what "quick feedback" looks like:
The temptation with slow builds is "I should be really sure before I compile." This is the analysis spiral in disguise. A test that doesn't reproduce the bug on the first try but compiles and runs is not wasted -- it's a known-good harness you can adjust in the next cycle, often with a much faster incremental rebuild.
workerd and its dependencies (KJ, Cap'n Proto) have extensive test infrastructure:
KJ_TEST("description") { ... } in *-test.c++ files.wd-test format for JS/TS integration testsbazel test //path:target --test_arg='-f...' --test_output=allMost KJ/capnp bugs can be reproduced with a self-contained KJ_TEST using public API (pipes, streams, promises, HTTP). You rarely need internal access.
Write the test. Run the test. Analyze the results. Think. Iterate.
Not the other way around.