Analyze the rms-cloud-tasks test suite for consistency, completeness, redundancy, parallel safety, assertion quality, and mocking. Produces a report only (no test modifications). Use when the user asks to critique tests, review the test suite, or generate a report for fixing tests.
Analyze all tests in the rms-cloud-tasks project and produce a report only—do not modify any test files. The report is intended to be used as a prompt for an AI agent (or developer) to fix the tests later.
tests/ matching test_*.py (pytest).tests/conftest.py and tests/cloud_tasks/instance_manager/gcp/conftest.py.cloud_tasks.common, cloud_tasks.queue_manager, cloud_tasks.instance_manager, cloud_tasks.worker, and CLI (cloud_tasks.cli).tests/manual/ contains shell scripts and configs for manual runs; exclude from automated-test critique or note as manual-only.Apply these criteria when reviewing each test file and each test case.
assert counts["pending"] == 2, not just assert "pending" in counts or assert counts.get("pending")).assert len(tasks) == 1) when the expected count is known; avoid only assert len(items) >= 1 unless the count truly varies.task_id, data, ack_id).test_<behavior>_<condition>_<expected> or test_<function>_<when>_<result>).and in asserts) (this improves test failure clarity and aligns with project testing standards)._thread_local and _pricing_cache_lock actually exist in the repo before referencing them. Treat those symbols only as potential signposts for areas that might leak state (e.g., thread-local storage or shared locks), not as facts about the codebase. When reviewing files such as conftest and instance_manager, (1) search for those symbols and document findings if present; (2) if not found, report that they were not observed rather than implying they must exist. Also check tests that mutate fixture-derived instances for possible state leaks.pytest -n auto (if used).time.sleep, datetime.now(), or expiration logic should mock or freeze time for determinism; note time-sensitive tests without freezing.os.environ or sys.argv should patch them; note tests that would fail with different env or argv.send_message, receive_tasks) should be mocked with AsyncMock where the test is async; note misuse.@pytest.mark.parametrize: Similar test cases (e.g. multiple invalid configs, multiple providers) should be parameterized instead of copy-pasted; note repeated test bodies that differ only in input.@pytest.fixture(params=["aws", "gcp", "azure"]) for provider; note other places where parameterized fixtures would reduce duplication.@pytest_asyncio.fixture; note sync fixtures in async test files or misuse of event_loop scope._wait_for_shutdown, create_task, or process spawning for isolation and whether they are sufficient.pytest.raises(SomeError) as exc_info and assert on str(exc_info.value) or the exception's message attribute. Note tests that only check exception type.function > class > module > session). The GCP conftest uses module/package scope for speed; note whether any fixture scope causes isolation issues or cross-test pollution.copy_gcp_instance_manager_for_test helper exists to avoid thread-local serialization; note tests that mutate shared manager instances without copying.uuid4() or random for assertions without seeding are non-deterministic; note and suggest seeding or fixed values where assertion stability matters.assert x == y, f"Expected {x} to equal {y}"); note assertions that would be hard to debug on failure.if/else/elif), Python ternary expressions (x if condition else y), other branching (e.g. match/case, which provides pattern matching and was added in Python 3.10; Python historically lacked a traditional switch/case—use match/case where appropriate), loops, or complex logic; note tests that do and suggest splitting or parameterizing.pytest tests --cov=src/cloud_tasks --cov-report=term-missing), not a subset. .coveragerc omits tests/*, azure.py, and aws.py; note whether omit list is appropriate and whether any critical code is excluded from coverage expectations.Produce a single markdown report with the following structure. Do not edit any test files; only write the report.
# Test Suite Critique Report
**Generated:** [date]
**Scope:** tests/ (pytest); modules: common, queue_manager, instance_manager, worker, cli
## Executive summary
- Overall assessment (strengths, main gaps).
- **Coverage:** At least 80%; measured by running the entire test suite. Note if 80% is met and whether measurement is full-suite.
- **Exception messages:** When testing exceptions with defined messages, tests must assert on message contents (e.g. `pytest.raises(...) as exc_info`, then `str(exc_info.value)`). Note violations.
- High-priority fixes vs. nice-to-have.
## 1. Return values and assertions
- Tests that only check existence or non-None; suggest exact value or type/format checks.
- Lists asserted with `>= N` where exact length is knowable; suggest exact length.
- Missing shape or key checks for dicts/objects.
## 2. Success and failure conditions
- Per area (common, queue_manager, instance_manager, worker, cli): table of "behavior | tested? | notes".
- Missing: validation errors, missing config, provider/not-found errors, edge cases.
## 3. Consistency
- Naming/structure inconsistencies with examples.
- Fixture usage and duplication across files.
## 4. Completeness
- Coverage map (what's tested, what's missing per module).
- Doc/spec gaps.
## 5. Redundancy
- Duplicate or overlapping tests with file:test references.
## 6. Parallel execution and isolation
- Global state, order dependence, GCP fixture scope and deepcopy usage, shared resources.
## 7. Mocking and dependency isolation
- Real external calls, time-sensitive tests without freezing, env/argv dependencies, AsyncMock vs Mock.
## 8. Config and validation testing
- Pydantic/config validation coverage, exception message assertions, load_config edge cases, secrets in tests.
## 9. Parameterization
- Tests that could be parameterized, missing boundary value tests.
## 10. Async and concurrency
- Async fixture usage, timeouts, worker multiprocessing/async isolation.
## 11. Error handling
- Missing error body or message verification.
- Exception tests that only assert type; require asserting message contents where defined.
## 12. State and workflow
- Task DB transitions, worker lifecycle, CLI subcommands, idempotency.
## 13. Test data and fixtures
- Unrealistic data, cleanup issues, fixture scope, GCP deepcopy usage.
## 14. Flakiness indicators
- Time-based assertions, order dependence, external dependencies, unseeded randomness.
## 15. Regression and documentation
- Missing bug references, spec/test alignment, manual-only coverage.
## 16. Other
- Unclear tests, slow tests, missing assertion messages, multi-responsibility tests, AAA violations, logic in tests.
## 17. Code coverage
- Target 80%; full-suite measurement. List modules below 80% or with significant uncovered lines. Note .coveragerc omit list.
## Prompt for an AI agent to fix tests
This section is a **reusable prompt template** to be filled with the report output. Use the placeholders below; include either the full report or a summarized version as specified.
**Template (fill placeholders):**
```text
Apply the following test-suite fixes. Use the critique report as context.
<REPORT_SUMMARY>
Paste the full report or a concise summary (sections 1–17) here.
</REPORT_SUMMARY>
<FAILURES>
List specific failures, file names, test names, and line references from the report.
</FAILURES>
<FILES_TO_EDIT>
List the test/conftest files to modify (paths under tests/).
</FILES_TO_EDIT>
Constraints:
- **Coverage:** Run the full test suite for coverage; require ≥80% for code under test; cover almost all non-exception lines.
- **Exception messages:** For tests that expect exceptions with defined messages, use `pytest.raises(...) as exc_info` and assert message content: `assert "expected substring" in str(exc_info.value)` (or equivalent). Do not only assert that an exception was raised.
- **Production code:** Do not modify production code. Fix only tests and conftest files.
- **Behavior:** Preserve existing passing behavior; only add or change assertions and test structure as indicated by the report.
Example filled prompt:
Apply the following test-suite fixes.
<REPORT_SUMMARY>
[Section 2] test_config.py: missing failure-path tests for invalid provider.
[Section 4] test_worker_init.py: no test for num_simultaneous_tasks boundary.
[Section 8] test_config.py: exception tests do not assert message content.
</REPORT_SUMMARY>
<FAILURES>
- tests/cloud_tasks/common/test_config.py: add tests for invalid provider; in test_runconfig_raises_*, use exc_info and assert str(exc_info.value).
- tests/cloud_tasks/worker/test_worker_init.py: add parameterized test for num_simultaneous_tasks at min/max.
</FAILURES>
<FILES_TO_EDIT>
tests/cloud_tasks/common/test_config.py
tests/cloud_tasks/worker/test_worker_init.py
</FILES_TO_EDIT>
Constraints: Run full test suite for coverage (≥80%). Use pytest.raises(...) as exc_info and assert str(exc_info.value) for exception message checks. Do not modify production code. Preserve existing passing behavior.
tests/ (test_*.py) and conftest files.assert, pytest.raises, response/return checks, and fixtures).