CI health check, failure analysis, root cause classification — morning pre-flight through deep diagnosis
Two modes depending on context:
Quick check: is the environment healthy? What's the CI dashboard look like? Any flaky trends?
Before checking CI results, verify the test environment is healthy. This prevents wasting time investigating "failures" caused by environment issues.
Dev environment:
mcp__playwright__browser_navigate to https://dev.publicgrid.energy — does it load? Check for 5xx errors.mcp__playwright__browser_navigate to https://dev.publicgrid.energy/sign-in — does sign-in page render?mcp__supabase__execute_sqlSELECT 1wzlacfmshqvjhjczytanQuick verdict:
Bash with gh run list --workflow=main-workflow.yml --limit 10 to get recent runs from Cottage-Energy/cottage-testsPresent a dashboard-style summary:
## CI Health — [date]
### Environment Status
- Dev app: [Healthy / Down / Degraded]
- Supabase: [Reachable / Unreachable]
### Overview
- Total scopes: [X] passed, [Y] failed
- Last full regression: [date/time]
- Next scheduled: daily at 5 AM UTC
### Dashboard
| Scope | Last Run | Status | Duration | Failed Tests |
|-------|----------|--------|----------|-------------|
| Smoke | [time] | Pass/Fail | [duration] | [count] |
| Regression1 (Chromium) | [time] | Pass/Fail | [duration] | [count] |
| Regression2 (Firefox) | [time] | Pass/Fail | [duration] | [count] |
| Regression3 (Safari) | [time] | Pass/Fail | [duration] | [count] |
| Regression4 (Mobile Chrome) | [time] | Pass/Fail | [duration] | [count] |
| Regression5 (Mobile Safari) | [time] | Pass/Fail | [duration] | [count] |
| Regression6 (Mobile Chrome) | [time] | Pass/Fail | [duration] | [count] |
| Regression7 (Mobile Safari) | [time] | Pass/Fail | [duration] | [count] |
For any failed runs:
Bash with gh run view <run_id> --log-failed to get failure detailsFor each failed test, check if there's already a known bug or in-progress fix:
mcp__linear__search_issues with keywords from the failing test's feature area or error message/log-bugThis prevents duplicate bug reports and gives context on whether failures are being addressed.
Compare the last 5-10 CI runs to identify flaky tests (tests that intermittently pass and fail):
gh run list --workflow=main-workflow.yml --limit 10 --json databaseId,conclusion,startedAt to get run IDsgh run view <id> --log-failed to collect failed test names### Flaky Test Trend (last [N] runs)
| Test | Fail Rate | Last 5 Runs | Pattern |
|------|-----------|-------------|---------|
| `path/test.spec.ts` > "test name" | 3/5 (60%) | PFPFP | Intermittent — likely timing/data issue |
| `path/test2.spec.ts` > "test name" | 5/5 (100%) | FFFFF | Persistent — broken, not flaky |
| `path/test3.spec.ts` > "test name" | 2/5 (40%) | PPFFP | Recent regression — started failing 2 runs ago |
Actions by pattern:
gh pr list --state merged --limit 5After the dashboard is complete:
Don't wait for the user to paste logs — pull them directly:
gh run list --workflow=main-workflow.yml --limit 5 via Bash to find the failed rungh run view <run_id> --log-failed via Bash to get the failure outputmcp__github__get_pull_request + mcp__github__get_pull_request_files if the failure is on a PR check*/5 min cron waits, ~30 min each) are included.gh run view --job=<jobId> --log and grep for test progress markers [N/M] to see which test was running when cancelled.From the failure output, extract:
mcp__github__list_commits on cottage-nextjs to find what changed recentlymcp__github__list_pull_requests for recently merged PRsmcp__github__get_pull_request_files on suspicious PRsmcp__supabase__execute_sql to verify test data, feature flags, account statesmcp__linear__list_issues or mcp__linear__search_issues to search for bugs mentioning the failing test or feature areaIf the logs and code aren't enough to diagnose, reproduce the failure live:
mcp__playwright__browser_navigate to the URL the test visitsmcp__playwright__browser_snapshot to see current UI statemcp__playwright__browser_click, browser_fill_form, browser_select_optionmcp__playwright__browser_take_screenshot at the point where the test failsmcp__playwright__browser_network_requests to check for API failuresmcp__playwright__browser_console_messages for JS errorsThis confirms whether the issue is the app (product bug) or the test (stale locator/logic).
/log-bug to file in Linear, link the failed test as evidence/fix-test to fix the test/fix-test to update page objects and flow logic (this is expected maintenance)/fix-test to stabilize, or /exploratory-test to investigate the root timing issue## Failure Analysis — [test name or scope]
### Environment Status
- Dev app: [Healthy / Down / Degraded]
- Supabase: [Reachable / Unreachable]
### CI Dashboard
[scope table from Phase 0c — include if running full analysis]
### Failed Test(s)
| File | Test Name | Scope/Browser | Retries | Error |
|------|-----------|---------------|---------|-------|
| `tests/e2e_tests/path/file.spec.ts` | "test name" | Regression1/Chromium | 0/2 passed | [brief error] |
### Error Details
[error message and relevant stack trace]
### What Changed
- **Recent commits**: [relevant commits with dates]
- **Recent PRs**: [PR #X merged Y ago — changed Z]
- **Database state**: [relevant findings from Supabase]
- **Existing bugs**: [Linear BUG-XXX if already filed]
### Flaky Trend
[flaky table from Phase 0f — include if relevant]
### Root Cause Classification
**Category**: [Product Bug / Test Code Issue / Environment / Data Dependency / UI Change / Flaky]
**User Impact**: [If product bug: what the user experiences. If test/env issue: "No user impact — test infrastructure only"]
**Analysis**:
[Explanation of why this failed, supported by evidence from CI logs, GitHub commits, Supabase queries, and/or Playwright MCP investigation]
### Recommended Action
- [ ] [Specific action — e.g., `/fix-test` to update POM locator for changed button]
- [ ] [Secondary action if needed]
### Related
- [Linear ticket if applicable]
- [PR that caused the change]
- [Similar past failures if known]
When multiple tests fail in the same run:
Based on the classification, chain to the right skill:
/log-bug to file in Linear/fix-test to fix the test/fix-test/test-plan to understand the change (PR analysis), then /fix-test to update tests/fix-test to stabilize, or /exploratory-test to investigate timing| Tool | Purpose |
|---|---|
| Playwright MCP | browser_navigate, browser_snapshot, browser_click, browser_take_screenshot, browser_network_requests, browser_console_messages — env pre-flight + reproduce failures interactively |
| Supabase MCP | execute_sql, list_tables — env pre-flight + check data state, feature flags, schema changes |
GitHub MCP or Bash (gh CLI) | Fetch workflow runs, failure logs, commits, PRs, and run history for flaky trending |
| Linear MCP | search_issues, list_issues — cross-reference failures with existing bugs |
Read | Read test code, page objects, fixtures |
Grep, Glob | Cross-reference failures with local test files |
After completing this skill, check: did any step not match reality? Did a tool not work as expected? Did you discover a better approach? If so, update this SKILL.md with what you learned.