Guide for discovering, analyzing, and pruning the Butlers test suite. Use when working on test condensation beads (epic bu-rhztl), assessing test bloat, identifying pruning targets, or rewriting tests to be contract-driven. Triggers on test reduction, test pruning, test consolidation, or condensation tasks for this project. Also use when a fresh session needs to assess test health, create new condensation beads, or resume in-progress condensation work.
Systematic reduction of the Butlers test suite to ~2,000 contract-driven tests. Each surviving test must trace to an architectural invariant, an RFC wire contract, or an OpenSpec capability.
CURRENT=$(find tests/ -name '*.py' -exec grep -c 'def test_' {} + 2>/dev/null | awk -F: '{sum+=$2} END {print sum}')
echo "Current test count: $CURRENT (skill baseline was 13,675 on 2026-04-05)"
bd list --parent bu-rhztl — see which beads are done/in-progressbd show <bead-id> for targets and acceptance criteriaabout/heart-and-soul/ for invariants, relevant RFCs in about/legends-and-lore/If your measured counts differ >10% from this skill's numbers, update references/domains.md before starting work.
If beads are already in-progress or completed:
# 1. What's done, what's available?
bd list --parent bu-rhztl
bd ready
# 2. Check predecessor's progress on your domain
git log --oneline -- tests/YOUR_DOMAIN/ | head -10
# 3. Measure current state of your domain
find tests/YOUR_DOMAIN -name '*.py' -exec grep -c 'def test_' {} + 2>/dev/null | awk -F: '{sum+=$2} END {print sum}'
If bu-zkrix (Phase 1) is NOT completed, stop — Phase 1 must finish first.
All tests must map to exactly one tier. If a test doesn't fit, it's a pruning target.
tests/contracts/Heart-and-soul non-negotiables. Tagged @pytest.mark.contract. Each test
docstring cites its RFC/principle. 15 invariants:
RFC-defined schemas and state machines:
OpenSpec-driven. Map each spec's WHEN/THEN Scenarios to test functions. Tests exercise behavior through MCP tool interface or public API — not internal helpers. Assertions are structural (non-None, correct type, non-empty) not behavioral (exact strings, specific counts, ordering). See references/classification.md for the decision matrix.
uv run pytest tests/YOUR_DOMAIN -q --tb=short — zero failuresBefore marking a bead complete:
uv run pytest tests/YOUR_DOMAIN -q --tb=short — 0 failuresuv run pytest tests/contracts/ -q -m contract (if Phase 1 done)"Condense X tests: N → M (details of what was removed)"During condensation, you may find tests that validate behavior NOT in any spec:
Document your decision in a commit message or bead comment.
references/domains.md — per-domain targets, file inventories, strategies.
references/classification.md — how to decide keep/delete/rewrite for any test.
references/beads.md — dependency graph, bead IDs, lifecycle.