Use when evaluating whether evidence supports a causal claim or designing a study to test one. Triggers on "does X cause Y", "should we change X to improve Y", "we noticed X and Y move together", "the data shows...", "our A/B test found...", auditing causal reasoning in a document or analysis, drawing conclusions from observational data, confounders, selection bias.
Correlation does not establish cause. Whether you're designing a study or auditing someone else's causal reasoning, you need: (1) where the claim sits on Pearl's ladder, (2) what the causal structure is, and (3) whether the quantity is identifiable from available data.
Follow the steps in order. Do not skip to study design before completing the DAG and identifiability check.
Not every causal question needs the full pipeline. If the causal structure is simple (one obvious cause, no plausible confounders) and the stakes are low, state the estimand and the threat you're dismissing in one sentence each. The discipline is locating the question on the ladder (Pearl), not running nine steps for a two-variable system. Scale the rigor to the cost of a wrong causal conclusion.
If you're auditing causal reasoning rather than designing a study, start here:
Then use the full pipeline below for any claim that needs deeper analysis.
Locate the question on the ladder first — this determines what kind of evidence can even address it:
| Rung | Question type | Example |
|---|---|---|
| 1 — Association | What does X tell us about Y? | "Users with notifications have lower retention" |
| 2 — Intervention | What happens if we do X? | "If we disable notifications, what happens to retention?" |
| 3 — Counterfactual | What would have happened? | "Would this user have churned if they hadn't gotten a notification?" |
Observational data lives on Rung 1. Causal claims require Rung 2 or 3. Identify which rung the question is asking about — this determines what study design can answer it.
Before defining anything, build a model of the system. What's been tried? What constraints exist? What data is available? State what you're confident about and what you're uncertain about — uncertainty tells you where the DAG edges are weakest.
If entering from representing-and-intervening, the Represent phase already covers this — use that model directly.
State precisely what you want to measure before touching data.
Example: "The ATT of receiving any push notification in days 1–7 on 30-day retention, compared to receiving no notification in days 1–7, for users who would be targeted by the current notification system."
If you cannot write this sentence, stop. The question is not yet well-formed.
Draw a directed acyclic graph (DAG) of the causal structure. This is not optional.
Node types:
Draw edges for every plausible direct causal relationship. Ask for each pair: "Could X directly cause Y, independent of all other variables?" For each edge, note your confidence — low-confidence edges are where sensitivity analysis should focus.
Common confounders to check:
Reverse causation check: Could Y (or early Y) cause T? If yes, add that edge. This is selection bias.
A backdoor path is any non-causal path from T to Y (one that goes "backward" through T's causes).
List every backdoor path: T ← C → Y
For each path, check:
Backdoor criterion: The effect of T on Y is identifiable from observational data if you can find an adjustment set Z such that:
If the backdoor criterion cannot be satisfied (because confounders are unmeasured), check the frontdoor criterion or instrumental variable conditions.
Before specifying your adjustment set, check whether any candidate control variable is a collider — a variable caused by both T and Y (or their causes).
Conditioning on a collider opens a spurious path. This is a common error.
Example: If "notification opened" is caused by both receiving a notification (T) and by the user already being engaged (a cause of Y), then conditioning on "opened" opens a spurious path between T and Y.
Second example: "Published results" is a collider on system quality and having a verifier — good teams publish AND teams with verifiers publish. Conditioning on "published" makes it look like verifiers cause quality, when both are caused by something else.
If a proposed control variable has arrows pointing into it from multiple other variables, it may be a collider. Do not include it in the adjustment set without checking the graph.
From the DAG analysis, list the variables that must be conditioned on to block all backdoor paths without opening collider paths.
If the adjustment set is feasible (all variables are measured): the effect may be identifiable from observational data. Proceed to Step 8a.
If the adjustment set requires unmeasured variables: the effect is not identified from observational data. Proceed to Step 8b.
If the effect is identified from observational data via the adjustment set:
If unmeasured confounders block identification, random assignment is required.
Experiment design checklist:
Before finalizing, run through the threat list:
| Threat | Check |
|---|---|
| Unmeasured confounders | Are there common causes of T and Y not in the adjustment set? |
| Reverse causation | Could early Y cause T? Is there a time ordering check? |
| Collider bias | Are you conditioning on any variable caused by T or Y? |
| Mediator adjustment | Are you controlling for any variable on the causal path T→Y? |
| SUTVA violation | Can one unit's treatment affect another's outcome? (spillover) |
| Measurement error in T | Is treatment receipt measured accurately? |
| Attrition / selection into outcome | Does missingness in Y depend on T or confounders? |
| External validity | Does the study population match the target population for the intervention? |
| Time-varying confounding | In systems that evolve (e.g., software across releases), before/after comparisons are confounded by everything else that changed between versions. |
Summarize findings as:
Estimand: [precise statement]
DAG: [sketch or list of key edges]
Open backdoor paths: [list]
Adjustment set: [variables] or [not identifiable — unmeasured U]
Recommended design: [observational with adjustment / experiment with holdout]
Key threats: [top 2-3]
What this study can and cannot answer: [1-2 sentences]
| Mistake | Fix |
|---|---|
| Adjusting for mediators | Check: does the variable sit on the T→Y path? If yes, exclude from adjustment set |
| Conditioning on colliders | Check: does the variable have arrows from both T-side and Y-side? If yes, exclude |
| Skipping the estimand | You cannot design a study without knowing what quantity you're estimating |
| Treating selection as confounding | Reverse causation requires time-ordering check, not just adjustment |
| Running an experiment on the wrong population | Eligibility criteria must match the estimand's target population |
| Choosing outcomes post-hoc | Pre-specify primary outcome before data collection or analysis |