Spec-driven E2E test creation: plan what to test through structured discovery phases, then scaffold a local Shiplight test project and write YAML tests by walking through the app in a browser.
A spec-driven workflow that front-loads testing expertise through structured planning before any tests are written. Tests run with npx shiplight test --headed — no cloud infrastructure required.
Use /create_e2e_tests when the user wants to:
Always produce artifacts. Every phase writes a markdown file. Artifacts clarify your own thinking, give the user something to review, and guide later phases. When the user provides detailed requirements, use them as source material — skip questions already answered, but still produce the artifact.
Confirm before implementing. Present the spec (Phase 2 checkpoint) for user confirmation before spending time on browser-walking and test writing. Echo back your understanding as structured scenarios to catch mismatches early.
Each phase reads the previous phase's artifact. Discover feeds Specify, Specify feeds Plan, Plan feeds Implement, Implement feeds Verify. If an artifact exists from a prior run, offer to reuse it.
Escalate, don't loop. When something fails or is ambiguous, report it and ask the user rather than retrying silently.
Phase 1: Discover → test-strategy.md (understand the app & user goals)
Phase 2: Specify → test-spec.md (define what to test in Given/When/Then)
Phase 3: Plan → test-plan.md (prioritize, structure, per-test guidance)
Phase 4: Implement → *.test.yaml files (setup project, write tests, run them)
Phase 5: Verify → updated spec files (coverage check, reconcile spec ↔ tests)
Check for existing artifacts before starting. The only way to skip artifact generation is if the user explicitly says so.
| Situation | Behavior |
|---|---|
| User explicitly says "skip to implement" or "just write the tests" | Phase 4 only |
Existing test-specs/test-strategy.md | Offer to reuse, skip Phase 1 |
Existing test-specs/test-spec.md | Offer to reuse, skip Phases 1-2 |
Existing test-specs/test-plan.md | Offer to reuse, skip to Phase 4 |
Goal: Understand the application, the user's role, and what matters most to test.
Output: <project>/test-specs/test-strategy.md
Get project path — ask where to create the test project (e.g., ./my-tests). All artifacts and tests will live here. Create the test-specs/ directory.
If cloud MCP tools are available (SHIPLIGHT_API_TOKEN is set), use the /cloud skill to fetch environments and test accounts — this can pre-fill the target URL and credentials.
Silent scan — before asking questions, gather context from what's available:
package.json, frameworkUnderstand what to test — ask the user what they'd like to test, then ask targeted follow-up questions (one at a time, with recommendations based on your scan) to fill gaps: risk areas, user roles, authentication, data strategy, critical journeys. Skip questions the user has already answered.
Write test-strategy.md containing:
Goal: Define concrete test scenarios in structured Given/When/Then format, prioritized by risk. Surface ambiguities that would cause flaky or incomplete tests.
Input: reads test-specs/test-strategy.md
Output: <project>/test-specs/test-spec.md
Read test-strategy.md to understand scope and priorities.
Generate user journey specs — for each critical journey, write:
Review for testing risks — scan each journey for issues that would cause flaky or incomplete tests: data dependencies, timing/async behavior, dynamic content, auth boundaries, third-party services, state isolation, environment differences. Add a Testing Notes section to each journey with identified risks and mitigations. If anything is ambiguous, ask the user (one at a time, with a recommended answer and impact statement).
Write test-spec.md with all journey specs.
Checkpoint — present a summary table for user review:
| # | Journey | Priority | Steps | Edge Cases | Risks |
|---|---|---|---|---|---|
| 1 | User signup | P0 | 5 | 3 | Timing |
| 2 | ... | ... | ... | ... | ... |
Ask: "Does this look right? Any journeys to add, remove, or reprioritize?"
Wait for user confirmation before proceeding.
Goal: Create an actionable implementation plan with per-test guidance.
Input: reads test-specs/test-spec.md
Output: <project>/test-specs/test-plan.md
Read test-spec.md.
Define test file structure — map journeys to test files:
tests/
├── auth.setup.ts (if auth needed)
├── signup.test.yaml (Journey 1)
├── checkout.test.yaml (Journey 2)
└── ...
Set implementation order — ordered by:
Per-test guidance — for each test file, specify:
Write test-plan.md.
Checkpoint — present summary:
Ready to implement N test files. Shall I proceed?
Goal: Set up the project and write all YAML tests guided by the plan.
Input: reads test-specs/test-plan.md
Skip any steps already done (project exists, deps installed, auth configured).
Configure AI provider — check if the test project already has a .env with an AI API key. If not, ask the user to choose a provider:
To run YAML tests, I need an AI provider for resolving test steps. Which provider would you like to use?
A) Google AI —
GOOGLE_API_KEY(Get key) — default model:gemini-3.1-flash-lite-previewB) Anthropic —ANTHROPIC_API_KEY(Get key) — default model:claude-haiku-4-5C) OpenAI —OPENAI_API_KEY(Get key) — default model:gpt-5.4-miniD) Azure OpenAI — requiresAZURE_OPENAI_API_KEY+AZURE_OPENAI_ENDPOINT— setWEB_AGENT_MODEL=azure:<deployment>E) AWS Bedrock — uses AWS credential chain — setWEB_AGENT_MODEL=bedrock:<model_id>F) Google Vertex AI — uses GCP Application Default Credentials — setWEB_AGENT_MODEL=vertex:<model>G) I already have it configured
After the user chooses, ask for their API key and save it to the test project's .env file. For A/B/C, the model is auto-detected from the key. For D/E/F, also save WEB_AGENT_MODEL with the appropriate provider:model prefix. Optionally, the user can set WEB_AGENT_MODEL to override the default model (e.g., WEB_AGENT_MODEL=claude-sonnet-4-6).
Scaffold the project — call scaffold_project with the absolute project path. This creates package.json, playwright.config.ts, .env.example, .gitignore, and tests/. Save the API key to .env.
Install dependencies:
npm install
npx playwright install chromium
Set up authentication (if needed) — follow the standard Playwright authentication pattern.
Add credentials as variables in playwright.config.ts:
{
name: 'my-app',
testDir: './tests/my-app',
dependencies: ['my-app-setup'],
use: {
baseURL: 'https://app.example.com',
storageState: 'tests/my-app/.auth/storage-state.json',
variables: {
username: process.env.MY_APP_EMAIL,
password: { value: process.env.MY_APP_PASSWORD, sensitive: true },
// otp_secret_key: { value: process.env.MY_APP_TOTP_SECRET, sensitive: true },
},
},
},
Standard variable names: username, password, otp_secret_key. Use { value, sensitive: true } for secrets. Add values to .env.
Write auth.setup.ts with standard Playwright login code. For TOTP, implement RFC 6238 using node:crypto (HMAC-SHA1 + base32 decode) — no third-party dependency needed.
Verify auth before proceeding. Run npx shiplight test --headed to execute the auth setup and confirm it saves storage-state.json. If it fails, escalate to the user — auth is a prerequisite for everything else.
If the test plan involves special auth requirements (e.g., one account per test, multiple roles), confirm the auth strategy with the user before proceeding.
For each test in the plan (or each test the user wants):
new_session with the app's starting_url.inspect_page to see the page, then act to perform each action. This captures locators from the response.get_locators for additional element info when needed..test.yaml content following the best practices below..test.yaml file, then call validate_yaml_test with the file path to check locator coverage (minimum 50% required).close_session when done.Important: Do NOT write YAML tests from imagination. Always walk through the app in a browser session first to capture real locators. Tests without locators are rejected by validate_yaml_test.
When guided by test-plan.md:
After writing all tests, run them:
npx shiplight test --headed
When a test fails:
Goal: Validate test coverage against the spec and reconcile any drift.
Input: reads test-specs/test-spec.md, test-specs/test-plan.md, and all .test.yaml files
This phase only runs when spec artifacts exist.
For each spec journey, confirm the test covers the happy path and all listed edge cases.
Present a coverage summary:
| Spec Journey | Priority | Scenarios Specified | Tests Written | Coverage |
|---|---|---|---|---|
| User signup | P0 | 4 | 4 | ✓ |
| Checkout | P0 | 3 | 2 | ✗ — edge case "empty cart" not covered |
Flag gaps and extras (test steps not in the spec).
Update spec artifacts to match what was actually implemented:
test-spec.md — mark skipped scenarios with reason, add scenarios that emerged during implementation, update edge cases to reflect what was testedtest-plan.md — correct file structure, note deviations from the original planThis keeps artifacts accurate for future test maintenance and expansion.
Read the MCP resource shiplight://yaml-test-spec-v1.3.0 for the full language spec (statement types, templates, variables, suites, hooks, parameterized tests).
Read the MCP resource shiplight://schemas/action-entity for the full list of available actions and their parameters.
These best practices bridge the YAML language spec and the action catalog to help you write fast, reliable tests.
act, get_locators) during browser sessions, then write ACTION statements. ACTIONs replay deterministically (~1s).validate_yaml_test.VERIFY: for all assertions. Do not write assertion DRAFTs like "Check that the button is visible".URL: /path for navigation instead of action: go_to_url.CODE: for network mocking, localStorage manipulation, page-level scripting. Not for clicks, assertions, or navigation.intent fieldintent is the intent of the step — it defines what the step should accomplish. The action/locator or js fields are caches of how to do it. When a cache fails (stale locator, changed DOM), the AI agent uses intent to re-inspect the page and regenerate the action from scratch.
Because intent drives self-healing, it must be specific enough for an agent to act on without any other context. Describe the user goal, not the DOM element — avoid element indices, CSS selectors, or positional references that break when the UI changes:
# BAD: vague, agent can't re-derive the action
- intent: Click button
# BAD: tied to DOM structure that can change
- intent: Click the 3rd button in the form
- intent: Click element at index 42
# GOOD: describes the user goal, stable across UI changes
- intent: Click the Submit button to save the new project
action: click
locator: "getByRole('button', { name: 'Submit' })"
js: shorthandUse structured format by default for all supported actions. Read the MCP resource shiplight://schemas/action-entity for the full list of available actions and their parameters.
Use js: only when the action doesn't map to a supported action — e.g., complex multi-step interactions, custom Playwright API calls, or chained operations:
- intent: Drag slider to 50% position
js: "await page.getByRole('slider').first().fill('50')"
- intent: Wait for network idle after form submit
js: "await page.waitForLoadState('networkidle')"
js: coding rules.first(), .nth(1)) to avoid Playwright strict-mode errors{ timeout: 5000 } on actions for predictable timingintent is critical — it's the input for self-healing when js failspage, agent, and expect are available in scope{ timeout: 2000 }) on js: assertions that have an AI fallback, so stale locators fall back to AI quickly instead of waiting the default 5sVERIFY: shorthand — do not use action: verify directlyjs: assertions. The AI fallback only triggers when js throws (element not found, timeout). If js passes against the wrong element (stale selector matching a different element), the assertion silently succeeds — no fallback occurs. Keep js: assertions simple and specific to minimize this risk.js: condition best practicesjs: conditions are brittle and cannot auto-heal.js: conditions only for counter/state logic — e.g., js: counter++ < 10, js: retryCount < 3. Never use js: for DOM inspection like js: document.querySelector('.modal') !== null.CODE: to evaluate it and store the result, or use VERIFY: with js: (which at least has AI fallback on failure).WAIT_UNTIL: — AI checks the condition repeatedly until met or timeout. Default timeout is 60 seconds. Each AI check takes 5–10s, so set timeout_seconds to at least 15.WAIT: — fixed-duration pause. Use seconds: to set duration.See Smart waiting in E2E Test Design for when to use each.
intent first in ACTION statements for readabilityxpath is only needed when an ACTION has neither locator nor js.These principles govern what to test and how to structure tests — independent of the YAML format. Apply them during Phase 2 (Specify) and Phase 4 (Implement).
Each test must run independently — never depend on another test's side effects, execution order, or leftover state. If a test needs data, it creates that data itself.
# BAD: depends on a previous test having created "My Project"