When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
Before designing a test, understand:
Test Context
Current State
Constraints
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].
Weak hypothesis: "Changing the button color might increase clicks."
Strong hypothesis: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
| Baseline Rate | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 1% | 150k/variant | 39k/variant | 6k/variant |
| 3% | 47k/variant | 12k/variant | 2k/variant |
| 5% | 27k/variant | 7k/variant | 1.2k/variant |
| 10% | 12k/variant | 3k/variant | 550/variant |
Duration = Sample size needed per variant × Number of variants
───────────────────────────────────────────────────
Daily traffic to test page × Conversion rate
Minimum: 1-2 business cycles (usually 1-2 weeks) Maximum: Avoid running too long (novelty effects, external factors)
Homepage CTA test:
Pricing page test:
Signup flow test:
Best practices:
What to vary:
Headlines/Copy:
Visual Design:
CTA:
Content:
Control (A):
- Screenshot
- Description of current state
Variant (B):
- Screenshot or mockup
- Specific changes made
- Hypothesis for why this will win
Tools: PostHog, Optimizely, VWO, custom
How it works:
Best for:
Tools: PostHog, LaunchDarkly, Split, custom
How it works:
Best for:
DO:
DON'T:
Looking at results before reaching sample size and stopping when you see significance leads to:
Solutions:
Statistical ≠ Practical
Did you reach sample size?
Is it statistically significant?
Is the effect size meaningful?
Are secondary metrics consistent?
Any guardrail concerns?
Segment differences?
| Result | Conclusion |
|---|---|
| Significant winner | Implement variant |
| Significant loser | Keep control, learn why |
| No significant difference | Need more traffic or bolder test |
| Mixed signals | Dig deeper, maybe segment |
Test Name: [Name]
Test ID: [ID in testing tool]
Dates: [Start] - [End]
Owner: [Name]
Hypothesis:
[Full hypothesis statement]
Variants:
- Control: [Description + screenshot]
- Variant: [Description + screenshot]
Results:
- Sample size: [achieved vs. target]
- Primary metric: [control] vs. [variant] ([% change], [confidence])
- Secondary metrics: [summary]
- Segment insights: [notable differences]
Decision: [Winner/Loser/Inconclusive]
Action: [What we're doing]
Learnings:
[What we learned, what to test next]
# A/B Test: [Name]
## Hypothesis
[Full hypothesis using framework]
## Test Design
- Type: A/B / A/B/n / MVT
- Duration: X weeks
- Sample size: X per variant
- Traffic allocation: 50/50
## Variants
[Control and variant descriptions with visuals]
## Metrics
- Primary: [metric and definition]
- Secondary: [list]
- Guardrails: [list]
## Implementation
- Method: Client-side / Server-side
- Tool: [Tool name]
- Dev requirements: [If any]
## Analysis Plan
- Success criteria: [What constitutes a win]
- Segment analysis: [Planned segments]
When test is complete
Next steps based on results
If you need more context: