Write pytests that test real public interfaces with actual components, no mocking, and precise assertions. MIRA-specific patterns. Use when creating or reviewing tests.
Tests that verify implementation are worse than no tests - they provide false confidence while catching nothing.
Your job is not to confirm the code works. Your job is to:
Tests that always pass are actively harmful. They waste time and provide false security.
ABSOLUTE RULE: Do NOT use @pytest.mark.skip, @pytest.mark.skipif, or pytest.skip()
Tests either:
There is no third state. Skipped tests are:
If a test can't run:
NEVER commit a skipped test. Either make it pass or delete it.
NEVER write tests by reading implementation. That's how you write tests that mirror what code does instead of what it should do.
Step 1: Read ONLY the module's public interface
# Read THIS (public interface)
class ReminderTool:
def run(self, operation: str, **kwargs) -> Dict[str, Any]:
"""Execute reminder operations."""
pass
# DO NOT read implementation details
# DO NOT look at internal methods
# DO NOT read how it's implemented
Step 2: Document the contract
Before writing any test, answer these questions in writing:
MODULE CONTRACT ANALYSIS
========================
1. What is this module's PURPOSE?
- What problem does it solve?
- Why does it exist?
2. What GUARANTEES does it provide?
- What promises does the API make?
- What invariants must hold?
- What post-conditions are guaranteed?
3. What should SUCCEED?
- Valid inputs
- Happy path scenarios
- Boundary cases that should work
4. What should FAIL?
- Invalid inputs
- Boundary conditions that should error
- Security violations
- Resource constraints
5. What are the DEPENDENCIES?
- What does this module depend on?
- Are there too many dependencies?
- Could this be simpler?
6. ARCHITECTURAL CONCERNS:
- Is this module doing too much?
- Is it papering over design failures elsewhere?
- Does the contract make sense or is it convoluted?
- Should this module even exist?
Step 3: Design test cases from contract
Based on contract analysis (NOT implementation):
See "CANONICAL EXAMPLE" section below for complete contract analysis walkthrough.
CRITICAL: Do NOT read the implementation file yourself. Use the contract-extractor agent as an abstraction barrier.
You've formed expectations about the contract from the interface. Now verify those expectations against actual implementation WITHOUT seeing the implementation yourself. The agent reads the code and reports ONLY contract facts (not implementation details).
Step 1: Invoke the contract-extractor agent
# Use Task tool to invoke the agent
Task(
subagent_type="contract-extractor",
description="Extract contract from module",
prompt="""Extract the contract from: path/to/module.py
Return:
- Public interface (methods, signatures, types)
- Actual return structures (dict keys, types)
- Exception contracts (what raises what, when)
- Edge cases handled
- Dependencies and architectural concerns"""
)
Step 2: Compare your expectations against agent report
Create a comparison:
EXPECTATION vs REALITY
======================
Expected return structure:
{
"status": str,
"results": list
}
Actual return structure (from agent):
{
"status": str,
"confidence": float, # I MISSED THIS
"results": list,
"result_count": int # I MISSED THIS
}
Expected exceptions:
- ValueError for empty query
Actual exceptions (from agent):
- ValueError for empty query ✓
- ValueError for negative max_results # I MISSED THIS
Expected edge cases:
- Empty results returns []
Actual edge cases (from agent):
- Empty results returns status="low_confidence", confidence=0.0, results=[]
# More nuanced than I expected
Step 3: Identify discrepancies and their implications
For each discrepancy, ask:
Example Analysis:
DISCREPANCY: Agent reports confidence field in return, I didn't expect it
IMPLICATION: This is part of the contract - add test to verify confidence in [0.0, 1.0]
DISCREPANCY: Agent reports ValueError for negative max_results, I didn't expect it
IMPLICATION: Good edge case handling - add negative test
DISCREPANCY: Agent reports 8 dependencies, I expected 3-4
IMPLICATION: ARCHITECTURAL CONCERN - too many deps, report to human
Step 4: Update test plan based on verified contract
Now you know:
Step 5: Design comprehensive test cases
# Based on VERIFIED contract (not assumptions):
# Positive tests
- test_search_returns_exact_structure # Verify all keys agent reported
- test_search_confidence_in_valid_range # Agent said 0.0-1.0
- test_search_respects_max_results # Agent confirmed this guarantee
# Negative tests
- test_search_rejects_empty_query # Agent confirmed ValueError
- test_search_rejects_negative_max_results # Agent revealed this
# Edge cases
- test_search_empty_results_structure # Agent showed exact structure
- test_search_with_no_user_data # Based on RLS info from agent
# Architectural concerns
- Report to human: "Module has 8 dependencies - possible SRP violation"
See "CANONICAL EXAMPLE" section below for complete agent invocation, comparison, and gap analysis walkthrough.
Only AFTER writing tests based on verified contract. Then you can read implementation for context, debugging, or refactoring - but tests are already protecting the contract.
A test that always passes proves nothing. You must see it fail.
Step 1: Write test based on contract expectations
Don't look at implementation. Write assertions based on what the contract says SHOULD happen.
def test_search_returns_confidence_score(search_tool, authenticated_user):
"""Contract: search must return confidence score between 0.0 and 1.0"""
user_id = authenticated_user["user_id"]
set_current_user_id(user_id)
# Based on contract, not implementation
result = search_tool.run(
operation="search",
query="Python async patterns",
max_results=5
)
# Contract expectations
assert "confidence" in result
assert 0.0 <= result["confidence"] <= 1.0
assert "results" in result
assert len(result["results"]) <= 5
Step 2: Run the test - expect failure or question success
pytest tests/test_search_tool.py::test_search_returns_confidence_score -v
If test FAILS:
If test PASSES immediately:
Step 3: Verify the test can actually catch bugs
Temporarily break the code and verify the test fails:
# In the actual implementation, temporarily break it:
def run(self, operation, **kwargs):
return {"confidence": 2.5} # INTENTIONAL BUG: exceeds 1.0
Run test - it should fail. If it doesn't, your assertions are too weak.
Step 4: Remove the intentional bug, test should pass
Now you have confidence the test actually works.
When writing tests, surface design problems - don't paper over them.
| Anti-Pattern | Why It's Wrong | What To Do Instead |
|---|---|---|
| Mocking | Tests mocks, not code. Hides integration issues. | Use real services (sqlite_test_db, test_db). If hard to test, fix design. |
| Reading implementation first | Tests mirror HOW instead of WHAT. Confirms current behavior, doesn't catch regressions. | Analyze contract WITHOUT reading code. Use contract-extractor agent. |
| Tests that mirror implementation | Testing that method calls BM25 then embeddings (HOW) vs testing returns relevant results (WHAT). | Test observable contract behavior, not internal paths. |
| Weak assertions | assert result is not None says nothing. | Precise: assert 0.0 <= result["confidence"] <= 1.0 |
| Only happy paths | Missing adversarial cases means bugs slip through. | Test failure cases: empty inputs, invalid values, boundary conditions. |
| Missing negative tests | Only testing what should succeed. | Test what should FAIL with pytest.raises and match= |
| Testing private methods | tool._internal() means public interface insufficient. | Report: "Public interface doesn't expose needed contract." |
| Papering over design problems | Mocking 8 dependencies instead of reporting. | Report: "Module has 8 dependencies - violates SRP." |
| Complex test setup | Need 5 fixtures for one test = tight coupling. | Report: "Module too coupled - consider interface segregation." |
| Unclear contract | Can't answer "what SHOULD this return?" | Report: "Contract doesn't specify behavior for None values." |
| Module papering over upstream failures | Tool validates/fixes data from another module. | Report: "Fix upstream module, don't compensate downstream." |
When you find architectural red flags, report them:
ARCHITECTURAL CONCERN: ToolName
PROBLEM: [Specific issue]
EVIDENCE: [What you observed]
IMPACT: [Why this matters]
RECOMMENDATION: [Specific fix]
MIRA's testing philosophy: NEVER MOCK. Use actual services, real databases, real APIs.
This is a hard rule because mocking is where I slip. The test will seem "hard" without mocks and I'll think "just this once..." DON'T.
Mocks test mocks, not code:
# This tests nothing about real behavior:
@mock.patch('tool.database')
def test_reminder_tool(mock_db):
mock_db.query.return_value = [{"id": 1, "title": "Test"}]
result = tool.get_reminders()
# You have NO IDEA if real code works
Real tests catch real bugs:
# This will fail if database schema changes:
def test_reminder_tool(sqlite_test_db):
tool = ReminderTool()
tool.run("add_reminder", title="Test", date="2025-01-01")
# Real database query
rows = sqlite_test_db.execute("SELECT * FROM reminders WHERE title = ?", ("Test",))
assert len(rows) == 1
If it's hard to test without mocks, the design is wrong. Fix the design, don't mock it away.
The authenticated_user fixture gives you EVERYTHING you need:
def test_anything(authenticated_user):
# These are ALL set up and ready:
user_id = authenticated_user["user_id"] # Test user ID
continuum_id = authenticated_user["continuum_id"] # Test user's continuum (ALREADY EXISTS)
email = authenticated_user["email"] # [email protected]
token = authenticated_user["access_token"] # Valid session token
# User context is ALREADY SET - just use user_id
# Continuum ALREADY EXISTS - just add messages to it
# Cleanup happens AUTOMATICALLY - no manual teardown needed
What authenticated_user does for you:
How to use it (SIMPLE):
# ✅ CORRECT - Just use the continuum_id that's already there
def test_add_messages(authenticated_user, test_db):
user_id = authenticated_user["user_id"]
continuum_id = authenticated_user["continuum_id"] # Use this!
set_current_user_id(user_id) # Set context
repo = get_continuum_repository()
msg = Message(role="user", content="Test message")
repo.save_message(msg, continuum_id, user_id) # Just save directly
# Message is in the database, ready to test
# ❌ WRONG - Don't create new continuums (test user already has one)
def test_add_messages_wrong(authenticated_user):
user_id = authenticated_user["user_id"]
repo = get_continuum_repository()
continuum = repo.create_continuum(user_id) # DON'T DO THIS!
# This creates a SECOND continuum - user should only have ONE
Common Patterns:
# Most common: Just add messages to test user's continuum
def test_tool(authenticated_user):
user_id = authenticated_user["user_id"]
continuum_id = authenticated_user["continuum_id"]
set_current_user_id(user_id)
# Add test data
repo = get_continuum_repository()
msg = Message(role="user", content="Test")
repo.save_message(msg, continuum_id, user_id)
# Test your code
result = tool.run("search", query="Test")
assert result["status"] == "success"
# API testing: authenticated_client has headers pre-set
def test_api(authenticated_client):
response = authenticated_client.get("/v0/api/endpoint")
assert response.status_code == 200
Test User Constants (for reference):
TEST_USER_EMAIL = "[email protected]"
SECOND_TEST_USER_EMAIL = "[email protected]"
# User IDs vary, always use authenticated_user["user_id"]
The second_authenticated_user fixture provides a SECOND fully-configured test user:
When you need to verify Row-Level Security (RLS) or multi-user scenarios, use both fixtures together:
def test_user_isolation(authenticated_user, second_authenticated_user):
"""Verify RLS prevents cross-user data access."""
user1_id = authenticated_user["user_id"]
user1_continuum_id = authenticated_user["continuum_id"]
user2_id = second_authenticated_user["user_id"]
user2_continuum_id = second_authenticated_user["continuum_id"]
# User 1 creates private data
set_current_user_id(user1_id)
repo = get_continuum_repository()
msg1 = Message(role="user", content="User 1 secret data")
repo.save_message(msg1, user1_continuum_id, user1_id)
# User 2 tries to access User 1's data
set_current_user_id(user2_id)
result = search_tool.run("search", query="secret", max_results=10)
# Verify User 2 cannot see User 1's data
assert len(result["results"]) == 0, "RLS violation: User 2 can see User 1's data"
Both fixtures provide identical structure:
authenticated_user = {
"user_id": str, # First test user ID
"continuum_id": str, # First test user's continuum
"email": str, # [email protected]
"access_token": str # Valid session token
}
second_authenticated_user = {
"user_id": str, # Second test user ID (different UUID)
"continuum_id": str, # Second test user's continuum (different UUID)
"email": str, # [email protected]
"access_token": str # Valid session token
}
When to use second_authenticated_user:
Cleanup is automatic for both users:
SQLite for Tool Testing:
@pytest.mark.schema_files(['tools/implementations/reminder_tool_schema.sql'])
def test_reminder_tool(sqlite_test_db):
tool = ReminderTool()
tool.run("add_reminder", title="Test", date="2025-01-01")
rows = sqlite_test_db.execute("SELECT * FROM reminders WHERE title = ?", ("Test",))
assert len(rows) == 1
Automatic Cleanup:
cleanup_test_user_data() runs after each test (autouse)Test files MUST mirror the codebase directory structure: