Explore the ERP web app with Playwright CLI and produce a UX-focused black-box test plan, detailed test cases, and actionable issue files with evidence.
Use this skill when you want Copilot to autonomously explore the ERP web app through the UI and generate a complete UX-focused black-box test plan for later execution by AI agents.
Explore the running application primarily through Playwright CLI and produce:
The emphasis is UX-first black-box testing while still validating basic functional correctness.
The output should be strong enough that:
Before starting, gather or infer these inputs:
docs/If some inputs are missing, proceed with the accessible parts of the app and document assumptions clearly.
Use UI-first exploration through Playwright CLI.
Rules:
Source code may be read to:
Source code must not be used to:
Use the playwright-cli skill or browser tools to interact with the app.
Key operations:
open_browser_page — navigate to URLsclick_element / type_in_page — interact with UI elementsscreenshot_page — capture evidenceread_page — inspect DOM state and visible textDevice viewports for responsive testing:
| Device | Width | Height |
|---|---|---|
| Desktop | 1920 | 1080 |
| Tablet | 768 | 1024 |
| Mobile | 375 | 812 |
Set the viewport before beginning each device pass.
Do not stop at “it works.”
Flag issues when UX is suboptimal, confusing, inefficient, inaccessible, or likely to cause user mistakes.
Examples that must be flagged:
At minimum evaluate:
Use representative responsive viewports and capture device-specific issues.
Assess at least:
Autonomously identify and explore, where available:
If the app is broad:
Exploration is considered complete when:
Generate the following artifacts.
Path:
docs/testing/sitemap.md
Produce this first, before any other deliverable. It grounds all subsequent work.
Contents:
Path:
docs/testing/test-plan.md
Contents:
Path pattern:
docs/testing/test-cases/<area>/<test-case-id>.md
Each test case file must include:
The test case must be specific enough that another AI agent can execute it without ambiguity.
Use this template:
# {TEST-CASE-ID}: {Title}
| Field | Value |
|------------------------|--------------------------------|
| Area / Module | |
| User Goal | |
| Priority | |
| Preconditions | |
| Test Type | |
| Coverage Tags | |
## Related Requirements / Assumptions
-
## Description
...
## Steps
1. ...
## Expected Results / Pass Criteria
- ...
## Fail Criteria
- ...
## Data
### Valid Data
- ...
### Invalid Data
- ...
## Responsive / Device Notes
- ...
## Accessibility Checks
- ...
## UX Heuristics / Design Expectations
- ...
## Notes for Future Execution Agents
- ...
## Related Issues
- ...
Path pattern:
docs/test-results/issues/<issue-id>.md
Each issue file must include:
bug, ux, accessibility, responsive, design, or combined)Make issues actionable for a future AI investigation/fix agent.
Use this template:
# {ISSUE-ID}: {Title}
| Field | Value |
|--------------------------|--------------------------------|
| Severity | |
| Priority | |
| Type | |
| Area / Module | |
| Environment / Viewport | |
## Summary
...
## Observed Behavior
...
## Expected Behavior
...
## Why This Is a Problem for Users
...
## Reproduction Steps
1. ...
## Evidence
- ...
## Suspected Scope / Affected Flows
- ...
## Hypothesis / Notes for Investigation
- ...
## Recommended Direction for Fix
- ...
## Screenshots / Artifact Paths
- ...
## Related Test Case IDs
- ...
Suggested paths:
docs/test-results/evidence/docs/test-results/screenshots/Use stable, descriptive names.
Path:
docs/testing/summary.md
A one-page overview produced after exploration is complete:
Use kebab-case filenames.
Suggested stable IDs:
TC-AUTH-001, TC-SALES-003, TC-INVENTORY-002ISSUE-UX-001, ISSUE-A11Y-004, ISSUE-RESP-002Prefer grouping test cases by module.
Use these scales consistently across all issues.
Severity (impact on the user):
| Level | Meaning |
|---|---|
| Critical | Blocks a core workflow, causes data loss, or makes a feature unusable |
| Major | Significant UX degradation; workaround may exist |
| Minor | Cosmetic issue or minor inconvenience |
| Low | Enhancement, polish, or nice-to-have improvement |
Priority (order of addressing):
| Level | Meaning |
|---|---|
| P0 | Must fix before release |
| P1 | Should fix soon; high user impact |
| P2 | Fix when convenient; moderate impact |
| P3 | Backlog; low urgency |
Follow this sequence. Write checkpoint notes to session memory after completing each phase so progress is preserved if interrupted.
docs/testing/sitemap.md).For each major module (top 3–5 by centrality):
Parallel sub-agents: After the sitemap is built, launch one sub-agent per module to explore in parallel. Each sub-agent opens its own browser, explores its assigned module, writes issue files, and returns a structured summary of findings. See the Parallel sub-agent strategy section.
Parallel sub-agents: Launch one sub-agent per device viewport. Each sub-agent navigates through the same set of key screens at its assigned viewport size, captures screenshots, and files responsive issues.
docs/testing/summary.md).docs/testing/test-plan.md).Parallel sub-agents: Launch one sub-agent per module to write that module’s test case files in parallel. Each sub-agent receives the exploration findings for its module and produces the test case Markdown files. A final sequential pass assembles the master test plan and executive summary from all module outputs.
When exploration encounters problems:
| Situation | Action |
|---|---|
| Page returns 500 or blank screen | Screenshot, file an issue, skip the page, continue with next area |
| Login fails | Continue with all unauthenticated areas; document the limitation |
| Form submission crashes | Screenshot, file an issue, attempt the same action once more, move on |
| Element not found / timeout | Refresh page, retry once; if still failing, file issue and move on |
| App becomes unresponsive | Reload the page; if persistent, restart the browser and resume |
| Unexpected modal or dialog blocks | Dismiss or screenshot, file issue, continue |
Never retry the same failing action more than twice. File an issue and move on.
Copilot can launch parallel sub-agents (runSubagent) to speed up exploration and deliverable generation. Each sub-agent operates independently with its own browser session.
| Phase | Parallel unit | What each sub-agent does |
|---|---|---|
| Module exploration | 1 agent per module | Opens own browser, authenticates, explores assigned module, writes issues, returns structured findings |
| Device testing | 1 agent per viewport | Opens browser at assigned viewport, navigates key screens, captures screenshots, files responsive issues |
| Test case generation | 1 agent per module | Receives exploration findings, writes test case files for its module |
When launching a sub-agent, include in the prompt:
After all parallel sub-agents return:
The final output must be:
Provide run-specific inputs at execution time rather than hardcoding them in the skill.
Pass these inputs explicitly when invoking the skill:
APP_URL: base URL of the target environmentTEST_USER_EMAIL: login email for the test userTEST_USER_PASSWORD: login password for the test userTEST_NOTES (optional): any environment-specific notes, seed data notes, MFA caveats, or scope limitsMAIN_MODEL (optional): model to use for the main orchestrator agent. Default: claude-opus-4. Override if a different model is preferred or available.SUBAGENT_MODEL (optional): model to use for parallel sub-agents. Default: claude-sonnet-4. Override to balance speed, cost, and capability.Example invocation payload:
APP_URL=https://your-app-url.example.com
[email protected]
TEST_USER_PASSWORD=<secret>
TEST_NOTES=Use seeded tenant Acme Demo. Focus on sales, customers, inventory, settings. Skip billing if unavailable.
MAIN_MODEL=claude-opus-4
SUBAGENT_MODEL=claude-sonnet-4
The skill works best when the main orchestrator and parallel sub-agents use different model tiers.
| Role | Default model | Rationale |
|---|---|---|
| Main orchestrator | claude-opus-4.6-1m | Strongest reasoning and synthesis for UX judgment, module prioritization, and aggregating findings across sub-agents |
| Parallel sub-agents | gpt-sonnet-4 | Fast and capable at following structured instructions, Playwright interaction, and template-driven output; reduces wall-clock time and cost when spawning multiple agents |
Why not use the top-tier model for everything?
Sub-agents perform scoped, template-driven work (explore one module, test one viewport, write test cases from provided findings). They don't need deep strategic reasoning — they need speed and reliable instruction-following. Since 3–5 sub-agents run per parallel phase, using a faster/cheaper model multiplies savings without meaningfully reducing quality.
Overriding: Pass MAIN_MODEL and/or SUBAGENT_MODEL in the invocation payload to use different models. Acceptable alternatives include gpt-5.4, claude-sonnet-4, or any model available in the current environment. If the environment does not support per-agent model selection, all agents will use the session's active model — in that case, prefer the orchestrator-tier model.
Use the provided values only for test execution. Do not persist secrets into generated documentation unless the repo already has an approved secure pattern for test credentials.
If credentials are missing or login fails, continue with all accessible unauthenticated areas and document the limitation in the test plan.
docs/
testing/
sitemap.md
summary.md
test-plan.md
test-cases/
auth/
sales/
inventory/
customers/
settings/
test-results/
issues/
evidence/
screenshots/
Explore the application autonomously through Playwright first, then write the docs from observed behavior. Keep the work black-box and UX-centered.