Name: Experiment Metrics
Author: pisithrps

Experiment Metrics Selection: STEDII Framework

When to use: Before launching any experiment, when metrics feel unreliable, or when experiment results are confusing

Framework source: Aakash Gupta's "How to Choose the Right Metrics to Evaluate Experiments"

The STEDII Framework

Choose experiment metrics that are:

Sensitive
Timely
Efficient
Debuggable
Interpretable
Isolated

1. Sensitive (Detects Small But Meaningful Changes)

What it means: The metric moves when your feature actually improves the experience

Bad example:

Metric: Monthly Active Users (MAU)

Metric	Sensitive?	Timely?	Efficient?	Debuggable?	Interpretable?	Isolated?	Total Score
Metric 1	2/3	3/3	2/3	3/3	3/3	2/3	15/18
Metric 2	3/3	1/3	3/3	2/3	3/3	3/3	15/18

Experiment Metrics

Experiment Metrics

Experiment Metrics Selection: STEDII Framework

The STEDII Framework

1. Sensitive (Detects Small But Meaningful Changes)

2. Timely (Results Available Quickly)

3. Efficient (High Statistical Power)

4. Debuggable (Easy to Diagnose Issues)

5. Interpretable (Easy to Understand and Explain)

6. Isolated (Measures Only What You Changed)

How to Use This Framework

Step 1: List Your Candidate Metrics

Step 2: Score Each Metric Against STEDII

Step 3: Select Primary + Guardrail Metrics

Step 4: Run Pre-Experiment Checks

Common Metric Selection Mistakes

Mistake #1: Using Only One Metric

Mistake #2: Confusing Leading and Lagging Metrics

Mistake #3: Metric Dilution

Mistake #4: Simpson's Paradox

Real-World Examples

Example 1: Netflix Thumbnail Test

Example 2: Booking.com Pricing Test

Quick Reference: Metric Selection Checklist

Context Routing Strategy

1. Pull Metrics from PRDs & Strategy

2. Query Analytics MCPs for Historical Data

3. Check for Metric Conflicts with Guardrails

4. Reference Past Experiments for Benchmarks

5. Route to Experiment Decision Framework

Output Quality Self-Check

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio

Experiment Metrics

Experiment Metrics

Experiment Metrics Selection: STEDII Framework

The STEDII Framework

1. Sensitive (Detects Small But Meaningful Changes)

2. Timely (Results Available Quickly)

3. Efficient (High Statistical Power)

4. Debuggable (Easy to Diagnose Issues)

5. Interpretable (Easy to Understand and Explain)

6. Isolated (Measures Only What You Changed)

How to Use This Framework

Step 1: List Your Candidate Metrics

Step 2: Score Each Metric Against STEDII

Step 3: Select Primary + Guardrail Metrics

Step 4: Run Pre-Experiment Checks

Common Metric Selection Mistakes

Mistake #1: Using Only One Metric

Mistake #2: Confusing Leading and Lagging Metrics

Mistake #3: Metric Dilution

Mistake #4: Simpson's Paradox

Real-World Examples

Example 1: Netflix Thumbnail Test

Example 2: Booking.com Pricing Test

Quick Reference: Metric Selection Checklist

Related Skills

Context Routing Strategy

1. Pull Metrics from PRDs & Strategy

2. Query Analytics MCPs for Historical Data

3. Check for Metric Conflicts with Guardrails

4. Reference Past Experiments for Benchmarks

5. Route to Experiment Decision Framework

Output Quality Self-Check

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio