Systematically browser-test features using Playwright Docker, observe live behavior patiently, trace function-level state changes, assess UX quality, and log all findings. Use when the user says "test feature", "test the UI", "browser test", "QA the app", "test landing", "test auth", "test discovery", "test ideation", "test dashboard", "test blog", "test templates", "test settings", "test admin", "test lp", or any request to visually test specific app features via the browser. Also trigger when the user says "run QA", "check the UI", "smoke test", "observe behavior", "watch what happens when", or names any feature area to test. Takes feature names as arguments (e.g., `/test-feature landing blog auth`).
Systematically test app features via Playwright Docker. Go beyond screenshots: observe live behavior, trace what updates when users interact, assess UX quality, and log broken logic or ugly UX experiences to the observations file.
Space-separated feature names. If none provided, ask which features to test.
Valid features: landing, blog, auth, dashboard, ideation, discovery, templates, settings, admin, lp
Example: /test-feature discovery ideation or /test-feature landing auth
Use the test idea ID from CLAUDE.md (925c54e8-b105-48f5-8270-e99e6db51927) for all {ideaId} routes. For {slug} routes (blog, templates, lp) pick any published slug from the DB.
Route group note: Next.js route groups (admin), (app), (auth), (marketing) are layout wrappers only — they do not appear in URLs.
landing — Marketing Home| Route | Notes |
|---|---|
/ | Landing page — hero, pricing, social proof |
Auth required: No
blog — Public Blog| Route | Notes |
|---|---|
/blog | Blog index (list of posts) |
/blog/{slug} | Individual published post — pick a real slug |
Auth required: No
auth — Authentication Flows| Route | Notes |
|---|---|
/sign-in | Login form |
/sign-up | Registration form |
/forgot-password | Password reset request |
/reset-password | Password reset form (requires token in URL) |
/auth/callback | OAuth callback — do not navigate directly; triggered by Google auth |
Auth required: No
dashboard — Idea Dashboard| Route | Notes |
|---|---|
/dashboard | Lists all ideas; create new idea CTA |
/idea | Redirect hub (no content, redirects to dashboard) |
Auth required: Yes
ideation — Ideation WorkspaceAll routes are under /idea/{ideaId}/. Replace {ideaId} with the test idea ID.
| Route | Notes |
|---|---|
/idea/{ideaId} | Idea overview: readiness scores, framework progress, timeline, kill signals |
/idea/{ideaId}/problem-definition | Problem definition framework (AI chat + structured output) |
/idea/{ideaId}/bmc | Business Model Canvas framework |
/idea/{ideaId}/competitive | Competitive landscape framework |
/idea/{ideaId}/value-proposition | Value proposition framework |
/idea/{ideaId}/assumptions | Assumptions tracker framework |
/idea/{ideaId}/review | Ideation readiness review (summary, scores, recommendations) |
/idea/{ideaId}/fishbone | Fishbone diagram (template framework) |
/idea/{ideaId}/root-cause-5whys | 5 Whys root-cause analysis (template framework) |
/idea/{ideaId}/mashup-ideation | Mashup ideation framework |
/idea/{ideaId}/mind-map | Mind map brainstorm framework |
/idea/{ideaId}/scamper-ideation | SCAMPER ideation framework |
Auth required: Yes
Note: Unknown slugs pass through as-is (custom deployed templates). [framework] is a dynamic segment resolved via SLUG_TO_FRAMEWORK map in the page component.
discovery — Discovery PhaseAll routes are under /idea/{ideaId}/discovery/.
| Route | Notes |
|---|---|
/idea/{ideaId}/discovery | Discovery dashboard: persona count, LP status, interview stats, JTBD count, assumption coverage, widget conversion rate |
/idea/{ideaId}/discovery/personas | Proto-personas list |
/idea/{ideaId}/discovery/personas/{personaId} | Persona canvas: full profile, linked assumptions, customer segments, framework summaries |
/idea/{ideaId}/discovery/interviews | Interviews list |
/idea/{ideaId}/discovery/interviews/{interviewId} | Interview detail: transcript, AI-extracted insights, assumption links |
/idea/{ideaId}/discovery/jtbd | Jobs-To-Be-Done list and analysis |
/idea/{ideaId}/discovery/analysis | Cross-interview synthesis and insight patterns |
/idea/{ideaId}/discovery/landing-page | Landing page builder (generate/edit/publish) |
/idea/{ideaId}/discovery/review | Discovery review: decision gate, synthesis, go/no-go |
Auth required: Yes
Note: {personaId} and {interviewId} must be real IDs from the test idea. Query the DB or check network responses after visiting the list pages.
templates — Templates Marketplace| Route | Notes |
|---|---|
/templates | Templates index (list of available frameworks) |
/templates/{slug} | Template detail page — pick a real slug from the index |
Auth required: Yes
settings — User Settings| Route | Notes |
|---|---|
/settings | Account info (email, display name), billing section, plan limits |
Auth required: Yes
Sections to test: Account card, BillingSection component (plan badge, upgrade CTA), Plan Limits card
admin — Admin Panel| Route | Notes |
|---|---|
/admin | Admin home / overview |
/admin/users | User management |
/admin/blog | Blog post management |
/admin/pipeline | Pipeline admin |
/admin/invite-codes | Invite code management |
/admin/feedback | User feedback inbox |
/admin/errors | Error log viewer |
/admin/ai-usage | AI usage / cost analytics |
/admin/activity | Activity feed |
/admin/feature-flags | Feature flag toggles |
Auth required: Yes (admin role)
lp — Published Landing Pages| Route | Notes |
|---|---|
/p/{slug} | Public landing page for a specific idea; built via the landing-page builder |
/rss.xml | RSS feed (no auth, GET only) |
Auth required: No
Before testing, verify infrastructure:
Dev server: Check if port 3001 is responding. If not, start it:
cd apps/web && npm run dev
Wait for "Ready" output before proceeding.
Auth session (if any feature requires login): Navigate to http://localhost:3001/sign-in, fill credentials ([email protected] / Test123!), submit, and confirm redirect to dashboard. Reuse this session for all authenticated routes — don't re-login between features.
Observations file: Create fixes_observations/{YYYY-MM-DD}-{feature}-observations.md for each feature being tested. Use today's date.
For EACH route in the feature, execute this sequence in full. Be patient — wait for async operations to settle before capturing state.
browser_navigate → http://localhost:3001{route}
browser_wait_for → wait for page to settle (specific text or 2-3s delay)
browser_take_screenshot → save to playwright-screenshots/{feature}-{route-slug}-initial.png
browser_snapshot → capture accessibility tree (DOM state)
browser_console_messages → check for JS errors/warnings
Patience rule: If the page shows a loading spinner or streaming text, wait for it to fully settle. Take a second screenshot after content loads. Do not capture observations mid-load.
Mandatory for every route. Pages that look fine in a screenshot can have broken scroll that locks users out of content.
Run via browser_evaluate on every page after it loads:
() => {
const html = document.documentElement, body = document.body;
const main = document.querySelector('main') || body;
const scrollables = [...document.querySelectorAll('*')].filter(el => {
const s = getComputedStyle(el);
return (s.overflowY === 'auto' || s.overflowY === 'scroll') && el.scrollHeight > el.clientHeight;
});
let maxBottom = 0;
main.querySelectorAll('*').forEach(el => { const b = el.getBoundingClientRect().bottom; if (b > maxBottom) maxBottom = b; });
return {
viewport: { w: innerWidth, h: innerHeight },
html: { scrollH: html.scrollHeight, clientH: html.clientHeight, canScroll: html.scrollHeight > html.clientHeight },
body: { overflow: getComputedStyle(body).overflow },
scrollables: scrollables.slice(0, 5).map(el => ({
el: el.tagName + (el.id ? '#'+el.id : '') + (el.className ? '.'+el.className.split(' ')[0] : ''),
scrollH: el.scrollHeight, clientH: el.clientHeight, clipped: el.scrollHeight - el.clientHeight
})),
content: { furthest: Math.round(maxBottom), cutOff: maxBottom > innerHeight + 50 }
};
}
Red flags:
| Signal | Meaning | Severity |
|---|---|---|
html.canScroll === false AND content.cutOff === true | Content below viewport but page can't scroll — users locked out | major |
scrollables empty AND page has below-fold content | No scrollable area — content trapped | major |
body.overflow === 'hidden' with no open modal | Stale scroll lock from a modal/drawer that didn't clean up | major |
When scroll is broken:
browser_press_key → End then Home — re-screenshot to see if content movedfullPage: true screenshot to reveal all trapped contentoverflow: hidden, h-screen without overflow-y-auto, or a flex parent missing min-h-0Read the screenshot, accessibility snapshot, scroll diagnostics, and console output. Look for:
overflow: hidden on a parent, h-screen without inner scroll, or hydration mismatch.This is the most important phase. Don't just click — observe and trace what happens at each step.
For each meaningful interaction:
browser_snapshot) before acting. Note visible UI elements, text, loading states.browser_click, browser_type, browser_key_press, etc.browser_wait_for for network responses, state changes, animations to settle (at least 1-2s for async ops).Forms:
Chat / Agent Panels (critical — see Phase 4):
Buttons and CTAs:
Modals / Drawers / Panels:
Tabs / Accordions / Steppers:
Dark Mode (if theme toggle exists):
When testing routes with an AI chat agent (ideation, discovery, or any panel with a chat interface):
When you send a message to the agent, observe and document each of the following:
| Component | What to observe |
|---|---|
| Input field | Does it clear after send? Does it disable during response? Does it restore cursor focus after? |
| Send button | Does it disable during generation? Does it change icon/label? Does it re-enable after response completes? |
| Message list | Does user message appear immediately? Does assistant bubble appear as a placeholder before content streams? Does scroll auto-follow to bottom during streaming? |
| Streaming behavior | Is text streamed token-by-token or chunked? Is there a visible cursor/indicator during streaming? Does it feel smooth or choppy? |
| Sidebar / panel state | Does any sidebar section update after the agent responds (e.g., saved notes, generated content, idea fields)? |
| Page content outside chat | Does the main content area update based on what the agent says? (e.g., does a "problem definition" section update when the agent confirms one?) |
| Error states | If the agent fails or times out, is there a clear error message? Is the UI recoverable (can you try again)? |
| Multi-turn behavior | Send a follow-up question. Does the agent reference previous context? Does the message list grow correctly? |
For each chat interaction, add a behavioral trace to the observations file:
### Behavioral Trace: [Action Description]
- **User action**: "sent message: 'What should my startup focus on?'"
- **Input field**: ✅ Cleared after send | ⚠️ Did not disable during response
- **Send button**: ✅ Disabled during generation, re-enabled after
- **Message appearance**: ✅ User message appeared instantly
- **Streaming**: ✅ Token-by-token, smooth | ⚠️ Visible lag at start
- **Scroll**: ✅ Auto-scrolled to bottom during streaming
- **Panel/sidebar updates**: [Describe any state changes outside the chat]
- **After response**: [What changed on the page? Did any sections update?]
- **UX quality**: [Rate: Excellent / Good / Needs work / Broken] — [Reason]
After completing interaction testing for a route, give it an overall UX quality score. This goes into the observations file.
UX Assessment Checklist:
UX Score:
S — Excellent. Polished, smooth, no issues.A — Good. Minor rough edges only.B — Acceptable. Some noticeable UX issues, functional.C — Poor. Friction points hurt usability.F — Broken. Cannot complete the primary flow.Every issue gets a severity:
| Severity | Definition | Action |
|---|---|---|
| critical | Feature broken, data loss risk, security issue | Log with full context. Present to user — do NOT fix without approval. |
| major | Feature partially broken, bad UX, >3 files to fix | Log with full context and proposed fix. Present to user. |
| minor | Wrong styling, missing polish, easy 1-2 file fix | Fix immediately in code. Re-test to confirm. Note the fix in observations. |
| cosmetic | Nitpick, slight misalignment, could-be-better | Log it. Don't fix unless the user asked for a thorough polish pass. |
| ux-smell | Works technically but feels clunky or confusing | Log under UX Observations. Propose improvement. Only fix if minor. |
The threshold for "fix immediately" is: you're confident in the fix, it touches 1-2 files, and it won't break anything else. When in doubt, log and ask.
When fixing issues, follow these defaults:
mcp__shadcn__*, mcp__radix-ui__*, mcp__magicui__* tools to find ready-made solutions..claude/docs/ui-components-ref.md for existing project components before creating new ones. Compose existing primitives instead of building from scratch.Each feature gets its own file: fixes_observations/{YYYY-MM-DD}-{feature}-observations.md
# {Feature} — QA Observations ({YYYY-MM-DD})
## Summary
- Routes tested: X
- Issues found: X (Y critical, Z major, W minor, V cosmetic, U ux-smell)
- Issues fixed: X
- Overall UX score: [S/A/B/C/F]
---
## Route: {route path}
### UX Score: [S/A/B/C/F]
{One-line justification}
### Issue: {short description}
- **Severity**: critical | major | minor | cosmetic | ux-smell
- **Screenshot**: `playwright-screenshots/{filename}.png`
- **Console errors**: {if any, paste the error}
- **Description**: {what's wrong and why it matters}
- **Root cause**: {trace to source file if possible}
- **Fix applied**: {Yes — describe what you changed} | {No — needs user approval}
- **Suggested fix**: {if not fixed, describe what to do}
---
### Behavioral Trace: {action description}
- **User action**: {what was done}
- **Input field**: ✅/⚠️/❌ {observation}
- **Send button**: ✅/⚠️/❌ {observation}
- **Message appearance**: ✅/⚠️/❌ {observation}
- **Streaming**: ✅/⚠️/❌ {observation}
- **Scroll**: ✅/⚠️/❌ {observation}
- **Panel/sidebar updates**: {any changes outside chat}
- **After response**: {what changed on the page}
- **UX quality**: {Excellent / Good / Needs work / Broken} — {reason}
---
### UX Observations
- {Friction point or UX smell with suggested improvement}
- {Another UX observation}
---
## No Issues Found
{If a route is clean, note it briefly}
- `/route` — Clean. No errors, layout correct, interactions working. UX: S.
After testing all routes for all requested features:
Print a summary to the conversation:
If fixes were applied: List each fix briefly so the user knows what changed in their codebase.
If major/critical issues exist: Present them clearly with enough context for the user to make a decision.
Highlight top UX friction points: Even if they're not bugs, surface the top 2-3 UX improvements that would have the biggest impact.
mcp__playwright-docker__* tools ONLY — never the built-in Playwright plugin or claude-in-chrome.playwright-screenshots/ at project root — not inside fixes_observations/.