Experiment design expert using pretotyping and lean validation for both new product concepts and existing product features.
Design fast, low-cost experiments to validate product hypotheses before committing to full development. This skill applies Alberto Savoia's pretotyping philosophy ("Make sure you are building The Right It before you build It right") alongside lean experimentation methods for both new and existing products.
Every experiment starts with a falsifiable hypothesis:
"At least X% of Y will do Z."
| Component | Description | Example |
|---|---|---|
| X% |
| The success threshold |
| 15% |
| Y | The target population | trial users who reach the dashboard |
| Z | The specific measurable action | click "Upgrade to Pro" within 7 days |
A good XYZ hypothesis is specific, measurable, and has a clear pass/fail threshold set before the experiment runs.
Stated interest is unreliable. Valid experiments measure actions that require commitment:
Always prefer SITG signals over surveys, likes, or verbal feedback.
Do not rely on market reports, competitor benchmarks, or industry averages. Run your own experiment with your own audience to get Your Own Data. Others' data reflects their context, not yours.
| Method | Description | Best For | Effort | Duration |
|---|---|---|---|---|
| Landing Page | Single-page site describing the product with a CTA (sign up, pre-order) | Testing value proposition and demand | Low | 1-2 weeks |
| Explainer Video | Short video demonstrating the concept with a CTA | Testing comprehension and interest | Low-Medium | 1-2 weeks |
| Pre-Order / Waitlist | Accept payment or email for a product that does not exist yet | Testing willingness to pay | Low | 2-4 weeks |
| Concierge MVP | Deliver the service manually to a small group, as if automated | Testing whether the solution actually solves the problem | Medium | 2-4 weeks |
| Method | Description | Best For | Effort | Duration |
|---|---|---|---|---|
| Fake Door Test | Add a button/link for a feature that does not exist; measure clicks | Testing demand for a specific feature | Low | 1-2 weeks |
| Feature Stub | Build minimal version (e.g., static mockup) behind a flag | Testing engagement with a feature concept | Low-Medium | 1-2 weeks |
| A/B Test | Show variant to a percentage of users; measure conversion | Testing incremental changes to existing flows | Medium | 2-4 weeks |
| Wizard of Oz | Feature appears automated to user but is manually operated behind the scenes | Testing complex features before building automation | Medium-High | 2-4 weeks |
| Survey (In-App) | Targeted survey shown to users who match specific behavioral criteria | Testing preferences when SITG methods are impractical | Low | 1 week |
Start with the assumption you need to test. Convert it into XYZ format.
Weak: "Users will like the new dashboard." Strong: "At least 30% of active users who see the new dashboard will set it as their default view within 5 days."
Choose based on:
| Element | Description |
|---|---|
| Primary metric | The single number that determines pass/fail |
| Success threshold | The minimum value to consider the hypothesis validated |
| Secondary metrics | Additional signals to watch (but not used for pass/fail) |
| Guardrail metrics | Metrics that must NOT degrade (e.g., existing conversion rate) |
| Outcome | Meaning | Next Action |
|---|---|---|
| Clear pass | Metric exceeds threshold | Proceed to build or next validation stage |
| Clear fail | Metric well below threshold | Pivot, modify hypothesis, or abandon |
| Inconclusive | Metric near threshold or insufficient sample | Extend duration, increase sample, or refine experiment |
Design experiments from hypotheses using the CLI tool:
# Run with demo data
python3 scripts/experiment_designer.py --demo
# Run with custom input
python3 scripts/experiment_designer.py input.json
# Output as JSON
python3 scripts/experiment_designer.py input.json --format json
{
"hypotheses": [
{
"hypothesis_text": "At least 20% of trial users will click Upgrade within 7 days",
"target_segment": "trial users on free plan",
"product_type": "existing"
}
]
}
For each hypothesis, the tool suggests 2-3 experiment designs with method, metric, success threshold, effort level, and duration estimate.
See scripts/experiment_designer.py for full documentation.
Use assets/experiment_plan_template.md to document each experiment:
brainstorm-ideas/ to generate ideas that become hypotheses.identify-assumptions/ to find the riskiest assumptions to test.pre-mortem/ before committing to full build.| Symptom | Likely Cause | Resolution |
|---|---|---|
| Tool suggests only low-SITG experiments | Hypothesis text lacks action-oriented keywords (pay, purchase, upgrade) | Rewrite hypothesis using explicit behavioral verbs; check KEYWORD_SIGNALS mapping in script |
| All experiments recommended are the same method | Hypothesis signals are too narrow or product_type is wrong | Verify product_type is set correctly (new vs. existing); broaden hypothesis to cover more intent signals |
| Demo mode works but custom input fails | Input JSON schema does not match expected format (missing hypotheses key) | Validate JSON has top-level hypotheses array with hypothesis_text, target_segment, product_type per entry |
| Experiment results are always inconclusive | Sample size too small or experiment duration too short for the metric | Extend timebox, increase traffic allocation, or choose a metric with higher signal-to-noise ratio |
| Fake door test shows high clicks but feature never builds | No decision framework tied to experiment outcome | Define clear pass/fail thresholds before running; document the "if pass, then build" commitment upfront |
| Team runs experiments but never acts on results | Results not connected to roadmap or prioritization process | Feed experiment outcomes into identify-assumptions/ for re-scoring; link to execution/outcome-roadmap/ |
In Scope:
Out of Scope:
execution/outcome-roadmap/)Important Caveats:
| Integration | Direction | Description |
|---|---|---|
brainstorm-ideas/ | Receives from | Ideas generated become hypotheses for experiment design |
identify-assumptions/ | Receives from | "Test Now" assumptions become hypotheses for this skill |
pre-mortem/ | Feeds into | Experiment results inform pre-mortem risk assessment before full build |
execution/create-prd/ | Feeds into | Validated hypotheses become PRD assumptions with evidence |
execution/brainstorm-okrs/ | Feeds into | Experiment metrics may become OKR key results |
execution/outcome-roadmap/ | Feeds into | Experiment outcomes inform Now/Next/Later roadmap placement |
Suggests 2-3 experiment designs for each product hypothesis based on keyword signal analysis.
| Flag | Type | Default | Description |
|---|---|---|---|
input_file | positional | (optional) | Path to JSON file with hypotheses array |
--demo | flag | off | Run with built-in sample data (3 hypotheses) |
--format | choice | text | Output format: text or json |