Name: Experiment Design
Author: pe-menezes

Experiment Design

Use this skill when the user asks to "design an experiment", "set up an A/B test", "calculate sample size", "plan a test", "figure out how long to run a test", "prioritize my tests", "decide how to split users", "launch an experiment", "monitor test data", "call a test", "segment test results", "analyze experiment results", "iterate on a test", or needs help with the mechanics of running a rigorous experiment from test design through result analysis.

pe-menezes0 starsMar 10, 2026

Occupation
Categories: Education

Design, run, and analyze rigorous experiments. Covers the full test lifecycle: defining test parameters, calculating statistical requirements, minimizing build complexity, launching cleanly, monitoring data, calling tests complete, evaluating outcomes across segments, and iterating based on learnings.

When to Use

Test setup: Have a solution to test and need to define the test parameters
Sample size calculation: Need to figure out how many users and how long
Test prioritization: Have multiple tests ready and need to sequence them
Launch planning: Need to plan the rollout — timing, ramping, user assignment
Monitoring: Test is live and need to know what to watch and when to act
Calling a test: Need to decide if the test is done and what the result is
Result analysis: Test is complete and need to extract maximum learning
Iteration planning: Test didn't win (or did) and need to decide what's next

Prerequisites

When to Use

Test setup: Have a solution to test and need to define the test parameters
Sample size calculation: Need to figure out how many users and how long
Test prioritization: Have multiple tests ready and need to sequence them
Launch planning: Need to plan the rollout — timing, ramping, user assignment
Monitoring: Test is live and need to know what to watch and when to act
Calling a test: Need to decide if the test is done and what the result is
Result analysis: Test is complete and need to extract maximum learning
Iteration planning: Test didn't win (or did) and need to decide what's next

Prerequisites

# Experiment: [Name] Date: [date] Author: [PM name] Strategic opportunity: [link to experiment-strategy doc] Status: [Design / Live / Complete / Iterating] ## Hypothesis If we [change], then [who] will [behavior], resulting in [metric improvement] because [rationale]. ## Test Parameters - Baseline metric: [value] ([metric name]) - MDE: [%] - P-value threshold: [0.05] - Statistical power: [0.80] - Sample size per variation: [calculated] - Number of variations: [2 or more] - Estimated daily eligible traffic: [number] - Estimated test duration: [days] - ROTI: [calculated] ## Metrics - **Primary**: [metric — this determines win/loss/null] - **Secondary**: [metrics — additional positive signals] - **Tradeoff**: [metrics — watch for negative impact] - **Leading indicators**: [metrics — early signals] ## User Assignment - Assignment level: [individual / team / account / geographic] - Split: [50/50 / other] - Eligibility criteria: [which users are included/excluded] - Client-side or server-side: [choice + rationale] ## Launch Plan - Ramp schedule: [immediate 100% / 10% → 50% → 100% over X days] - Interference check: [seasonal, promotional, competitive factors] - Anti-metrics to monitor during ramp: [metrics + thresholds for pausing] - Calling method: [primary rule / sequential sampling] ## Build Plan - Engineering effort: [days] - Design effort: [days] - Dependencies: [teams, tools, approvals] - Scope cuts for test: [what's excluded from the test version] ## Pre-Mortem | Risk | Preparation | Mitigation if triggered | |------|------------|----------------------| | [risk 1] | [prep] | [mitigation] | | [risk 2] | [prep] | [mitigation] | ## Results (fill after test) - Total samples: [control / solution] - Duration: [days] - Primary metric: [control value] → [solution value] (p = [value]) - Secondary metrics: [results] - Tradeoff metrics: [results] - Outcome: [Positive / Negative / Null] ## Segment Analysis | Segment | Control | Solution | Difference | Significant? | |---------|---------|----------|------------|-------------| | [segment] | [value] | [value] | [%] | [yes/no] | ## Learning Checklist [Answers to the 12 questions from Step 9] ## Decision - **Action**: [Implement / Don't implement / Iterate] - **Rationale**: [why] - **Iteration plan**: [if iterating — what changes and why] - **Next experiment**: [what follows from this learning]

Primary metric result	Tradeoff positive/neutral	Tradeoff negative
Positive impact	Implement	Implement only if primary gain outweighs tradeoff loss. Consider iterating to preserve gain while mitigating tradeoff.
Null impact	Check for non-outcome benefits (performance, UX quality, behavior changes). Implement only if benefit justifies cost.	Don't implement.
Negative impact	Don't implement. Learn why.	Don't implement. Learn why.

Experiment Design

When to Use

Prerequisites

Experiment Design

When to Use

Prerequisites

Process

Step 1: Define Test Parameters

Step 2: Prioritize Tests (if multiple)

Step 3: Minimize Build Complexity

Step 4: Plan User Assignment

Step 5: Plan the Launch

Step 6: Monitor the Test

Step 7: Call the Test and Evaluate Outcomes

Step 8: Segment Results

Step 9: Extract Maximum Learning

Step 10: Plan the Iteration

Step 11: Generate the Experiment Design Document

Common Mistakes

Key Principle

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System