Reason about causal structure using directed acyclic graphs for epidemiological studies. Use when identifying confounders, mediators, or colliders; when deciding which variables to adjust for; or when a reviewer questions the adjustment strategy.
Before adjusting for any variable, classify its role:
| Role | Definition | Adjust? | Consequence of error |
|---|---|---|---|
| Confounder | Common cause of exposure and outcome | Yes | Omitting causes bias |
| Mediator | On causal pathway from exposure to outcome | No (unless decomposing) | Adjusting blocks the effect of interest |
| Collider | Common effect of exposure and outcome | No | Adjusting opens spurious path |
| Instrument | Affects exposure only, not outcome directly | No (use for IV methods) | Adjusting reduces precision |
| Precision variable |
| Predicts outcome, unrelated to exposure |
| Optional |
| Improves precision if included |
To estimate the causal effect of X on Y:
A path is blocked if it passes through:
Use ASCII art for simple DAGs:
Confounder
/ \
v v
Exposure --> Outcome
Exposure --> Mediator --> Outcome
Exposure --> Collider <-- Outcome
(DO NOT condition on Collider)
Guidelines:
Interpreting coefficients of adjustment variables as if they are causal effects of those variables. Each variable in a model has a different confounding structure; the adjustment set valid for the primary exposure is not valid for other variables.
Conditioning on a descendant of the outcome, which can induce bias. Example: adjusting for a variable caused by the outcome.
Conditioning on a variable that is a common effect of two unobserved causes (one of the exposure, one of the outcome). This opens a path that was previously blocked.
Using a proxy for a confounder may not fully block the backdoor path. Residual confounding remains proportional to how poor the proxy is.
When a confounder is itself affected by prior exposure values. Standard regression cannot handle this; requires marginal structural models or g-estimation.
When units are nested (e.g., models within method classes, patients within hospitals):
Example for a study of method effects on forecast accuracy:
Trend
|
v
Method -----> Accuracy <----- Horizon
^ ^ |
| | |
+-- Location --+ |
| | |
+---- Time ----+---------------+
Key question: Is "number of forecasts submitted" a confounder, mediator, or collider?