Name: Adversarial QA Framework
Author: SirTheOracle

Adversarial QA Framework

Adversarial QA framework for high-confidence verification of a running application after changes. Produces independent QA reports, a synthesis pass, a ranked `issues.md`, and a `test-plan.md` for missing coverage. Use this skill after `adversarial-proposal` or `adversarial-implementation`, after merging a feature branch, before deploy, when unit tests pass but UI or integration issues are still suspected, or when the user asks for adversarial QA, regression testing, a release check, or a high-confidence QA pass.

SirTheOracle0 星標2026年4月16日

職業
分類: 測試

Overview

This Codex copy tracks the generalized adversarial QA workflow from the authoritative Claude skill. Two isolated QA testers independently test the application from different angles, then a QA Synthesizer cross-verifies their findings, deduplicates issues, and produces a ranked issues report plus a test plan for missing coverage.

The critical design principle is information isolation: QA Tester A and QA Tester B each have their own context window and test the application with zero knowledge of each other's findings. Only after both finish does the QA Synthesizer see both reports. Then A and B review the synthesizer's assessment, and the synthesizer reconciles everything into a final QA report.

Unlike unit tests that validate isolated functions, this skill catches issues that only manifest in the running application: UI regressions, broken page loads, cross-stage interactions, API contract violations, and integration failures.

Non-Negotiable: Real Workflow Gate

Treat fixture tests, mocked service tests, route-only unit tests, and direct helper assertions as supporting evidence only. They cannot prove the feature works.

Every QA pass must identify the real user workflow or live API/service path that the changed feature is supposed to support, then attempt to execute that path against the running application or a real backend service instance. A QA report may not conclude , , , or "no confirmed product defects" unless at least one tester or the synthesizer has exercised that live path and recorded the exact evidence.

Adversarial QA Framework

SirTheOracle0 星標2026年4月16日

職業
分類: 測試

Overview

Non-Negotiable: Real Workflow Gate

Treat fixture tests, mocked service tests, route-only unit tests, and direct helper assertions as supporting evidence only. They cannot prove the feature works.

Mode	Flag	Behavior
Default	(none)	Fully automated. Lead orchestrates all rounds without pausing.
Interactive	`--interactive`	Pauses after Round 1 (user reviews QA findings) and after Round 2 (user reviews synthesis).

Input	Behavior
Adversarial artifacts exist	Reads `final-plan.md` + `implementation.md` (if present) to understand what changed and what exact diffs were planned, focuses testing on affected areas
Git diff available	Uses `git diff main...HEAD` to identify changed files and focus testing
Standalone	Runs full regression suite — all smoke pages from config, API health
Scoped re-QA	Receives a list of `still-open` / `regressed` item IDs from a verification cycle, tests ONLY those areas

Round	Who	Can See	Cannot See
1	A	Test scope + source files + Strategy A	B's report
1	B	Test scope + source files + Strategy B	A's report
2	C	Test scope + qa-report-A + qa-report-B + source files	--
3	A	review-for-A.md + its own qa-report-A.md	qa-synthesis.md, review-for-B.md, qa-report-B.md
3	B	review-for-B.md + its own qa-report-B.md	qa-synthesis.md, review-for-A.md, qa-report-A.md
4	C	A's feedback + B's feedback + its own prior synthesis	--

Task	Subject	Blocked By
#1	QA Report A: UI Regression Testing	--
#2	QA Report B: Functional Integration Testing	--
#3	QA Synthesis: Cross-verification + ranking	#1, #2
#4	Feedback from A and B	#3
#5	Final QA Report: issues.md + test-plan.md	#4

Token	Config source
`{{project_name}}`	`project.name`
`{{backend_url}}`	`http://localhost:{services.backend.port}`
`{{frontend_url}}`	`http://localhost:{services.frontend.port}`
`{{backend_command}}`	`services.backend.command`
`{{frontend_command}}`	`services.frontend.command`
`{{frontend_working_dir}}`	`services.frontend.working_dir`
`{{backend_working_dir}}`	`services.backend.working_dir`
`{{activate_venv}}`	`services.backend.activate_venv`
`{{test_command}}`	`testing.backend.command`
`{{e2e_command}}`	`testing.frontend.e2e_command`
`{{playwright_install}}`	`testing.frontend.playwright_install`
`{{playwright_config}}`	`testing.frontend.playwright_config`
`{{playwright_autostart}}`	`testing.frontend.playwright_autostart`
`{{e2e_dir}}`	`testing.frontend.e2e_dir`
`{{screenshot_full_page}}`	`testing.screenshot.full_page`
`{{screenshot_viewport}}`	`testing.screenshot.viewport`
`{{test_email}}`	`auth.test_email`
`{{test_password}}`	`auth.test_password`
`{{login_endpoint}}`	`auth.login_endpoint`
`{{smoke_pages}}`	`qa.smoke_pages` (render as markdown list)
`{{core_workflows}}`	`qa.core_workflows` (render as markdown list)

Strategy	Focus	Test Types
A: UI Regression	Visual integrity, page loads, component rendering, console errors	Playwright screenshots, existing spec files, smoke test all pages from config
B: Functional Integration	Data flow, API contracts, feature behavior, cross-component interactions	API curl tests, new Playwright E2E flows, pytest integration tests

Input Mode	Tester A Emphasis	Tester B Emphasis
Adversarial artifacts	Regression test areas mentioned in final-plan.md	Test the specific changes proposed
Git diff	Smoke test all pages, focus screenshots on changed components	Test changed API endpoints and service logic
Standalone	Full smoke page regression (all pages from config)	Full API contract + core workflow E2E

Round	Failure	Recovery
Round 1	One tester fails	Wait for the other. C reviews single report for gaps.
Round 1	Both fail	Abort workflow. Report to user.
Round 2	Synthesizer fails	Re-spawn C. If second attempt fails, present raw reports.
Round 3	One tester fails	Proceed to Round 4 with available feedback.
Round 3	Both fail	C produces final report from synthesis alone.
Round 4	Synthesizer fails	Re-spawn reconciler. If fails again, present qa-synthesis.md.

Failure	Recovery
Backend not running	Tester starts it: `cd {{backend_working_dir}} && {{activate_venv}} && {{backend_command}} &`
Frontend not running	Tester starts it: `cd {{frontend_working_dir}} && {{frontend_command}} &`
Playwright not installed	Tester runs: `cd {{frontend_working_dir}} && {{playwright_install}}`
Database connection fails	Report as a critical issue in the QA report

File	When to Read	Purpose
`references/agent-teams-workflow.md`	Before starting orchestration	Complete tool call sequence for all rounds
`references/qa-report-format.md`	When building tester prompts	QA report template with manifest requirements
`references/subagent-fallback.md`	If persistent teammate messaging is not available	Alternative workflow using Codex subagents or file coordination
`agents/qa-tester.md`	When building A and B prompts	QA testing role, evidence collection protocol, manifest format
`agents/qa-synthesizer.md`	When building C's Round 2 prompt	Cross-verification, synthesis, manifest merge
`agents/qa-critic.md`	When messaging A and B in Round 3	Feedback guidance for defending findings
`agents/qa-reconciler.md`	When messaging C in Round 4	Reconciliation + final report + manifest update

File	Skill	Purpose
`adversarial-verify/SKILL.md`	adversarial-verify	Verification skill invoked in Step 7
`adversarial-verify/references/manifest-schema.md`	adversarial-verify	Canonical manifest.yaml schema (shared by both skills)
`adversarial-verify/references/verification-report-format.md`	adversarial-verify	Verification report output format

Agent	File	Used In	Purpose
QA Tester	`agents/qa-tester.md`	Round 1 (A and B)	Independent testing with assigned strategy + structured evidence
QA Synthesizer	`agents/qa-synthesizer.md`	Round 2 (C)	Cross-verification, synthesis, manifest merge, isolated review files
QA Critic	`agents/qa-critic.md`	Round 3 (A and B)	Review assessment, defend or accept findings
QA Reconciler	`agents/qa-reconciler.md`	Round 4 (C)	Bias-aware reconciliation into final report + manifest update

Adversarial QA Framework

Overview

Non-Negotiable: Real Workflow Gate

Adversarial QA Framework

Overview

Non-Negotiable: Real Workflow Gate

When to Use

Prerequisites

Runtime Prerequisites

Modes

Input Modes

Upstream Artifact Detection

Architecture

Why Isolation Works for QA

Why Teammates Stay Alive

How to Execute

Step 0: Load Project Config and Gather Context

Step 1: Create Team and Tasks

Step 2: Round 1 -- Spawn A and B (Parallel)

Step 2.5: Convergence Check

Step 3: Round 2 -- Spawn C (After A and B Complete)

Step 4: Round 3 -- Message A and B with Their Review Files

Step 5: Round 4 -- Forward Feedback to C

Step 6: Cleanup and Present

Building the Teammate Prompts

Config Substitution

What to Embed in Every Prompt

Strategy Details

Adapting to Input Mode

Test Strategy Decision

Use Existing Tests When

Create New Tests When

Error Handling

Teammate Timeout / Exit Recovery

Test Infrastructure Failures

Known Tradeoffs

Invoking the Skill

Direct Prompt

As a Reusable Command Wrapper

Output Directory

Step 7: Verification Loop (Automated)

Cycle 1: Verify

Cycle 1.5: Scoped Re-QA

Cycle 2: Verify Again

Maximum Cycles

Reference Files

Cross-Skill References

Agent Roles Summary

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing