Outcome KPI definition methodology - synthesizes Who Does What By How Much (Gothelf/Seiden), Running Lean (Maurya), and Measure What Matters (Doerr) into a practical framework for measurable outcome KPIs
"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf
Defines measurable outcome KPIs for user stories and features. Loaded during Phase 4 (Requirements Crafting) to produce outcome-kpis.md. Synthesizes three frameworks: customer-centric OKRs, lean metrics, and OKR methodology.
Primary template from Gothelf/Seiden. Every KPI answers five questions:
| Component | Question | Example |
|---|---|---|
| Who | Which user segment? | Returning customers with 2+ orders |
| Does What | What observable behavior changes? | Complete checkout without contacting support |
| By How Much | What is the measurable target? | 40% reduction in support tickets |
| Measured By | How do we collect the data? |
| Support ticket system + checkout analytics |
| Timeframe | When do we measure? | 30 days post-release, then weekly |
Formula: [Who] [Does what] [By how much]
Apply as litmus test: if a KPI cannot answer all five components, it measures an output (feature delivery), not an outcome (behavior change).
| Bad (Output) | Good (Outcome) |
|---|---|
| Launch mobile app v2 | Mobile users complete purchases 40% more often |
| Build recommendation engine | Users purchase from recommendations, increasing from 10% to 25% |
| Deploy onboarding redesign | New users complete onboarding within 24 hours 30% more often |
| Ship CSV export | Analysts resolve data questions without engineering support 60% of the time |
From Gothelf/Seiden: business results are lagging -- teams cannot directly influence them. Target leading indicators instead.
| Type | Definition | Examples | Actionable? |
|---|---|---|---|
| Lagging (Impact) | Business results already happened | Revenue, NPS, market share, churn rate | No -- too slow, too many variables |
| Leading (Outcome) | Behavior changes predicting business results | Purchase completion rate, feature adoption, retention | Yes -- teams can run experiments |
| Leading (Secondary) | Behaviors predicting primary leading indicators | Page visits, trial starts, onboarding steps completed | Yes -- most granular, fastest signal |
Map every KPI through this chain to ensure traceability:
Business KPI (Lagging/Impact)
Example: "Increase quarterly revenue by 15%"
|
v
Customer Behavior (Leading/Outcome)
+-- Users complete purchases from recommendations (+25%)
+-- Users return within 7 days (+20%)
|
v
Secondary Behavior (Leading/Secondary)
+-- Users browse recommendation pages (+30%)
+-- Users enable push notifications (+15%)
Each layer decomposes into more granular behavioral metrics. Teams target the highest-leverage behavior.
From Maurya (Running Lean): actionable metrics "tie specific and repeatable actions to observed results."
| Dimension | Vanity | Actionable |
|---|---|---|
| Measures | Business size (totals) | Individual behavior (rates) |
| Data type | Gross aggregates | Ratios and unit economics |
| Cause/effect | No insight into why | Directly signal product-market fit |
| Examples | Total users, page views, downloads | Activation rate, retention cohort, churn rate |
| Decision value | Cannot inform action | Drives specific experiments |
Pick ONE metric per product stage. Optimizing one metric reveals the next.
| Stage | Focus | Example OMTM |
|---|---|---|
| Empathy | Problem validation | Interview pain intensity (qualitative) |
| Stickiness | Retention | Churn rate, DAU/MAU ratio |
| Virality | Organic growth | Viral coefficient, referral rate |
| Revenue | Monetization | Customer Lifetime Value, MRR |
| Scale | Growth efficiency | CAC/LTV ratio, payback period |
Good metric characteristics: rate or ratio (not absolute number) | comparable across time | simple enough to remember | predictive | behavior-changing.
From Maurya: model the business as a production line. Identify the bottleneck, then focus KPIs there.
| Stage | Key Question | Example Metric |
|---|---|---|
| Acquisition | Are we reaching the right people? | Visitor-to-signup conversion rate |
| Activation | Do users get the "aha moment"? | % completing core action in first session |
| Retention | Do users come back? | Week-1 return rate, DAU/MAU |
| Revenue | Do users pay? | Trial-to-paid conversion rate |
| Referral | Do users tell others? | Referral rate, viral coefficient |
Activation is causal -- it drives retention, revenue, and referral. Prioritize activation KPIs when uncertain.
From Doerr (Measure What Matters): connect KPIs to strategic objectives.
Every Key Result uses the outcome formula. Quality criteria:
| Type | Expected Score | Resource Allocation | Failure Response |
|---|---|---|---|
| Committed | 1.0 (must deliver) | Consume most available resources | Requires explanation, replanning |
| Aspirational | 0.7 (stretch goal) | Overcommit slightly beyond capacity | Expected -- carry forward |
Sweet spot: blended aggregate of 0.6-0.7. Consistently hitting 1.0 = not ambitious enough.
| Anti-Pattern | Signal | Fix |
|---|---|---|
| Output-based KRs | "Launch X", "Build Y", "Ship Z" | Rewrite as behavior: "Users [do what] [by how much]" |
| Too many KRs | >5 KRs per Objective | Cut to 2-4 per Objective, max 3-5 Objectives |
| Vague KRs | No numeric target | Add baseline + target + deadline |
| Sandbagging | Consistently scoring 1.0 | Increase ambition level |
| Backlog retrofitting | OKRs match existing backlog 1:1 | OKRs filter backlog, not justify it |
Objective (qualitative, inspirational, timeboxed)
|
Key Results (2-4 per Objective, [Who][Does what][By how much])
|
Epics (weeks of work, aligned to Key Results)
|
User Stories (days of work, with measurable acceptance criteria)
Every story traces back to a Key Result. Orphan stories (no KR link) are potential waste.
Use this exact structure in outcome-kpis.md:
## Feature: {feature-name}
### Objective
{What success looks like in one sentence -- qualitative, inspirational, timeboxed}
### Outcome KPIs
| # | Who | Does What | By How Much | Baseline | Measured By | Type |
|---|-----|-----------|-------------|----------|-------------|------|
| 1 | {segment} | {behavior} | {target} | {current} | {method} | Leading/Lagging |
### Metric Hierarchy
- **North Star**: {the ONE metric that matters most for this feature}
- **Leading Indicators**: {behaviors that predict the north star}
- **Guardrail Metrics**: {metrics that must NOT degrade}
### Measurement Plan
| KPI | Data Source | Collection Method | Frequency | Owner |
|-----|------------|-------------------|-----------|-------|
### Hypothesis
We believe that {proposed solution} for {user segment} will achieve {key result}.
We will know this is true when {who} {does what} {by how much}.
Before finalizing KPIs, verify each one passes:
| Check | Question | If No |
|---|---|---|
| Measurable today? | Can you measure it with current instrumentation? | Add instrumentation to requirements |
| Rate not total? | Is it a ratio/rate, not a gross count? | Convert to rate (vanity -> actionable) |
| Outcome not output? | Does it describe user behavior, not feature delivery? | Rewrite as "[Who] [Does what] [By how much]" |
| Has baseline? | Do you know the current value? | Establish baseline before setting target |
| Team can influence? | Can the team directly affect this metric? | Decompose into more granular leading indicator |
| Has guardrails? | Are there metrics that must not degrade? | Add guardrail metrics (e.g., error rate, load time) |
The platform-architect needs these from outcome-kpis.md to plan instrumentation:
For deeper reading on source frameworks:
docs/research/running-lean-research.mddocs/research/measure-what-matters-research.mddocs/research/who-does-what-research.md