Designing task-based usability studies tied to a specific product decision
Testing live flows, prototypes, and “faked” implementations (fake door, Wizard of Oz)
Running moderated sessions (remote or in-person) and capturing high-quality evidence
Turning findings into a prioritized fix list (including high-ROI microcopy/CTA improvements)
When to use
“Create a usability test plan and script for <flow>.”
“We need to test a prototype with 5–8 users next week.”
“Validate a value proposition before building (fake door / Wizard of Oz).”
“Help me synthesize usability findings into a prioritized backlog.”
When NOT to use
You need statistically reliable estimates or causal impact (use analytics/experimentation)
You need open-ended discovery (“what problems do users have?”) without a specific flow to evaluate (use conducting-user-interviews)
You need a design critique or heuristic review without live user sessions (use running-design-reviews)
相关技能
You need to write specs or design docs for a feature, not test an existing flow (use writing-specs-designs)
You need to apply behavioral/persuasion design patterns to a flow (use behavioral-product-design); this skill evaluates usability, not designs behavioral nudges
You’re working with high-risk populations or sensitive topics (medical, legal, minors) without appropriate approvals/training
You don’t have a concrete scenario/flow to evaluate (clarify the decision first)
Inputs
Minimum required
Product + target user segment (who, context of use)
The decision this test should inform (what will change) + timeline
What you’re testing (flow/feature) + prototype/build link (or “recommend stimulus”)
Always include: Risks, Open questions, Next steps.
Anti-patterns (common failure modes)
Task-label leakage — Writing tasks like “Click the Settings gear icon” instead of “Change your notification preferences.” Tasks should reflect user intent, not reveal UI labels or locations.
Happy-path-only testing — Only testing the golden path and missing error states, edge cases, and recovery flows. Include at least one task that tests what happens when things go wrong.
Moderator bias / leading — Helping participants when they struggle (“Try clicking there”) instead of letting them work through confusion. The struggle IS the data; document it, don’t fix it.
Over-indexing on opinions — Asking “Did you like it?” after each task instead of observing behavior. Post-task ratings are supplementary; observed friction, errors, and workarounds are the primary signal.
Severity-blind issue list — Listing all issues as equal without severity/frequency classification. A cosmetic label issue and a flow-blocking error require different urgency; classify every finding.
Examples
Example 1 (Prototype test): “Create a usability test plan + moderator guide to evaluate our new onboarding flow (web) with 6 first-time users next week.”
Expected: full Usability Test Pack with neutral tasks, recruiting criteria, session logistics, and a synthesis structure.
Example 2 (Wizard of Oz): “We want to test an ‘AI auto-triage’ feature before building it. Design a Wizard of Oz usability test plan and script for 5 sessions.”
Expected: stimulus plan defining what’s simulated, tasks focused on value, and an issue log + readout.
Boundary example (redirect to conducting-user-interviews): “We don’t have a prototype yet, but we want to understand what problems users face during onboarding.”
Response: redirect to conducting-user-interviews for open-ended discovery; return here once you have a concrete flow or prototype to evaluate.
Boundary example (redirect to running-design-reviews): “Review our new checkout designs for usability issues without running user sessions.”
Response: redirect to running-design-reviews for expert heuristic evaluation; this skill requires live user sessions with task-based observation.
Boundary example (causality): “Run a usability test to prove the redesign will increase retention by 10%.”
Response: explain limits of small-n usability; recommend pairing with instrumentation/experimentation for causality and use usability to diagnose friction.