Guide for writing reliable, deterministic automated tests. Use when writing new tests for a feature or bug fix, reviewing or refactoring existing test code, creating setup and teardown logic, choosing between unit/integration/e2e testing, designing test fixtures or factories, troubleshooting flaky tests, or writing mocks and stubs. Apply this skill whenever test code is being authored or evaluated, even if the user does not explicitly ask for test quality guidance.
Use this skill when writing, reviewing, or refactoring automated tests at any level (unit, integration, end-to-end).
Produce tests that:
Pick the narrowest level that proves the behavior. Cheaper tests run faster, break less often, and pinpoint failures more precisely.
State the chosen level and a brief reason when writing a test. This forces a conscious decision rather than defaulting to whatever is easiest to scaffold.
A test that sometimes passes is worse than no test -- it erodes trust in the entire suite. Control every source of non-determinism:
Date.now() or time.time() directly. Tests that depend on wall-clock time will fail at midnight, on daylight-saving boundaries, or in different time zones.Shared mutable state is the leading cause of "works on my machine" test failures.
Run the same test ten times in a row. If run 2 fails because run 1 left something behind, the test is broken.
Resources that leak (temp files, database rows, open connections, spawned processes) degrade the test environment over time and cause cascading failures.
defer, t.Cleanup(), addCleanup(), afterEach, or the equivalent in your framework. This guarantees cleanup runs even when the test fails or throws early.Assert what the system does, not how it does it internally.
register(email, password) returns a session token and persists a user row with a hashed password."register invokes hashPassword exactly once, then calls db.insert with a specific SQL string."The first test survives an internal refactor. The second breaks the moment you rename a private method or switch ORMs.
Specific guidance:
Follow this sequence when writing a new test.
State clearly:
Pick unit, integration, or e2e using the criteria above. State the choice and the reason.
List every dependency: database, filesystem, network, clock, randomness, environment variables, external services, queues, async schedulers. For each one, decide whether to stub, fake, inject, or use the real thing. Control or isolate every dependency before writing the test body.
Structure each test as Arrange / Act / Assert:
Naming: use behavior-based names that read as sentences. test_expired_token_returns_401 tells you what broke; test_auth_3 does not.
Use parameterized or table-driven tests when testing the same logic across multiple input/output pairs. This reduces duplication and makes it easy to add edge cases.
Before considering the test done, check for:
@slow or @integration tag) and keep them out of the default fast test run.sleep() calls with bounded waits on a real condition (polling with a timeout, or an event/promise). A sleep that "usually works" is a flake waiting to happen.Keep test data small and intentional.
Recognizing these patterns helps catch problems before they ship:
setupEverything() makes debugging impossible. Fix: keep setup visible in the test or use clearly-named, focused helpers.sleep(2) before asserting. Fix: wait on a condition with a timeout.Before finalizing any test, confirm:
When writing a new test:
When reviewing an existing test: