Name: Causal Inference
Author: kantundpeterpan

Causal Inference | Skills Pool

Study Design	Method	Key Assumption
Randomised experiment	OLS / t-test on outcomes	Randomisation
Observational (selection on observables)	Propensity score matching / IPW	Unconfoundedness
Panel data, policy change	Difference-in-Differences	Parallel trends
Instrumental variable available	IV / 2SLS	Exclusion restriction + relevance
Threshold-based treatment	Regression Discontinuity	Local continuity
Matched time series	Synthetic Control	Donor pool similarity

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import numpy as np, pandas as pd

# Estimate propensity scores
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[confounders])
ps_model = LogisticRegression(max_iter=1000)
ps_model.fit(X_scaled, df["treatment"])
df["ps"] = ps_model.predict_proba(X_scaled)[:, 1]

# Check overlap (common support)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df[df.treatment==1]["ps"].hist(ax=ax, bins=30, alpha=0.5, label="Treated")
df[df.treatment==0]["ps"].hist(ax=ax, bins=30, alpha=0.5, label="Control")
ax.set_xlabel("Propensity score"); ax.legend()
fig.savefig("figures/propensity_overlap.png", dpi=150)

from scipy import stats

rows = []
for col in confounders:
    treated = df[df.treatment==1][col]
    control = df[df.treatment==0][col]
    smd = (treated.mean() - control.mean()) / np.sqrt(
        (treated.std()**2 + control.std()**2) / 2
    )
    rows.append({"Variable": col, "Treated mean": treated.mean().round(3),
                 "Control mean": control.mean().round(3), "SMD": smd.round(3)})

balance = pd.DataFrame(rows)
print(balance)
# SMD < 0.1 is well-balanced

import statsmodels.formula.api as smf

# DiD regression: Y ~ post + treated + post*treated + controls
did_model = smf.ols(
    "outcome ~ post * treated + age + income",
    data=df_panel
).fit(cov_type="cluster", cov_kwds={"groups": df_panel["unit_id"]})

print(did_model.summary())
att = did_model.params["post:treated"]
ci = did_model.conf_int().loc["post:treated"]
print(f"ATT = {att:.4f}, 95% CI [{ci[0]:.4f}, {ci[1]:.4f}]")

# Plot pre-period trends for treated and control groups
pre = df_panel[df_panel.post == 0]
trends = pre.groupby(["time", "treated"])["outcome"].mean().unstack()
fig, ax = plt.subplots()
trends.plot(ax=ax); ax.set_title("Pre-period trends (must be parallel)")
fig.savefig("figures/parallel_trends.png", dpi=150)

Using propensity score matching (nearest-neighbour, caliper = 0.02),
we estimated the average treatment effect on the treated (ATT).
Covariate balance was assessed using standardised mean differences (SMD);
all SMDs fell below 0.10 after matching (see Table 2).

The ATT was estimated at 3.2 percentage points (95% CI [1.4, 5.0],
p = .001), indicating that participation in the programme increased
the outcome by approximately 3.2 pp among treated units.

Causal Inference

When to Use

Causal Framework

Causal Inference

When to Use

Causal Framework

Method Selection

Workflows

Propensity Score Matching

Balance Table

Difference-in-Differences

Parallel Trends Check

Reporting

Review Checklist

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling