Synthetic Control Method (SCM) Skill | Skills Pool

Archivo del skill

Synthetic Control Method (SCM) Skill

Econometrics skill for Synthetic Control Method (SCM). Activates when the user asks about: "synthetic control", "SCM", "synthetic counterfactual", "donor pool", "placebo test", "in-space placebo", "in-time placebo", "MSPE ratio", "Abadie Diamond Hainmueller", "augmented synthetic control", "penalized SCM", "synthetic DID", "合成控制", "合成控制法", "捐助池", "安慰剂检验", "合成反事实", "合成DID", "政策评估"

sheehe0 estrellas16 abr 2026

Ocupación
Categorías: Herramientas de Laboratorio

Contenido de la habilidad

若当前会话由 Coase 研究工作流触发（/idea-discovery / /experiment-bridge / /paper-writing），本 skill 的输出必须按以下规则落入阶段文件，不得自行新建目录或脱离工作流上下文：

/idea-discovery Phase 2 Step 3 (Baseline Design Lock): 返回"模型设定 + 识别假设 + 主要识别风险"三段，由 planner 填入 planner/stage_7_baseline_design.md。此阶段不执行代码。
/experiment-bridge Phase 4 (Run Baseline): 生成并执行主回归代码，主回归表走 table skill 规范化，结果写入 executor/stage_1_run_baseline.md。
/experiment-bridge Phase 5 (Robustness): 提供本方法特有的替代估计量、识别诊断或敏感性检验，写入 executor/stage_2_explanation_robustness.md 对应条目。
/paper-writing Phase 6: 不直接参与。writer 从 executor/ 目录摘录方法描述，不得补跑回归。

若用户未指定工作流（直接提问使用本方法），忽略本节，按下方正文自由执行。

Synthetic Control Method (SCM) Skill

This skill guides complete synthetic control analyses: donor pool construction, weight optimization, gap estimation, placebo-based inference, and extensions including augmented SCM and synthetic DID. Designed for policy evaluation with few treated units.

Skills relacionados

Situation	Method
Single treated unit, many potential controls	Classic SCM
Few treated units	Multi-unit SCM or Synthetic DID
Treatment at aggregate level (state, country)	Classic SCM
Want DID-like inference with SCM weighting	Synthetic DID (Arkhangelsky et al. 2021)

# R — Abadie-Diamond-Hainmueller SCM
library(Synth)

dataprep_out <- dataprep(
  foo = df,
  predictors = c("gdp_pc", "trade_share", "inflation"),
  predictors.op = "mean",
  time.predictors.prior = 1980:1999,
  special.predictors = list(
    list("outcome", 1995:1999, "mean"),   # pre-treatment outcome lags
    list("outcome", 1990, "mean"),
    list("outcome", 1985, "mean")
  ),
  dependent = "outcome",
  unit.variable = "unit_id",
  unit.names.variable = "unit_name",
  time.variable = "year",
  treatment.identifier = 1,              # treated unit ID
  controls.identifier = 2:20,            # donor pool IDs
  time.optimize.ssr = 1980:1999,         # pre-treatment period
  time.plot = 1980:2010                  # full plot range
)

synth_out <- synth(dataprep_out)

# View donor weights (non-zero weights = selected donors)
synth.tab <- synth.tab(synth_out, dataprep_out)
print(synth.tab$tab.w)   # unit weights
print(synth.tab$tab.v)   # predictor weights

# Gap plot: treated vs synthetic control
path.plot(synth_out, dataprep_out,
          Ylab = "Outcome", Xlab = "Year",
          Main = "Treated vs Synthetic Control")
abline(v = 2000, lty = 2, col = "red")

# Gap (treatment effect) plot
gaps.plot(synth_out, dataprep_out,
          Ylab = "Gap (Treated − Synthetic)", Xlab = "Year",
          Main = "Treatment Effect Over Time")
abline(v = 2000, lty = 2, col = "red")
abline(h = 0, lty = 3)

# R — Augmented SCM (Ridge-augmented, Ben-Michael et al. 2021)
library(augsynth)

asyn <- augsynth(outcome ~ treatment,
                 unit = unit_id, time = year,
                 data = df,
                 progfunc = "Ridge",    # augmentation method
                 scm = TRUE)            # include SCM weights
summary(asyn)
plot(asyn)

# Python — SparseSC (penalized synthetic control)
import SparseSC
import numpy as np

# Reshape data to (N_units × T_periods) matrix
Y = df.pivot(index='unit_id', columns='year', values='outcome').values
T0 = 20  # number of pre-treatment periods

# Fit sparse synthetic control
sc = SparseSC.fit(
    features=Y[:, :T0],       # pre-treatment outcomes
    targets=Y[:, T0:],        # post-treatment outcomes
    treated_units=[0]          # index of treated unit
)

# Treatment effect estimate
treated_actual = Y[0, T0:]
synthetic_control = sc.predict(Y[0:1, :T0])[0]
effect = treated_actual - synthetic_control
print(f"Average post-treatment effect: {np.mean(effect):.4f}")

* Stata — Classic SCM
ssc install synth
ssc install synth_runner

tsset unit_id year

synth outcome gdp_pc trade_share inflation ///
    outcome(1995) outcome(1990) outcome(1985), ///
    trunit(1) trperiod(2000) ///
    fig keep(synth_results) replace

* Plot results
twoway (line outcome year if unit_id == 1, lcolor(black) lwidth(medium)) ///
       (line _Y_synthetic year if unit_id == 1, lcolor(red) lpattern(dash)), ///
       xline(2000, lpattern(dash)) ///
       legend(label(1 "Treated") label(2 "Synthetic Control")) ///
       title("Synthetic Control Estimate")

# R — in-space placebo (Synth)
library(Synth)
placebo_gaps <- list()
all_units <- unique(df$unit_id)

for (u in all_units) {
  controls <- setdiff(all_units, u)
  dp <- dataprep(foo = df, predictors = c("gdp_pc", "trade_share"),
                 predictors.op = "mean", time.predictors.prior = 1980:1999,
                 special.predictors = list(list("outcome", 1995:1999, "mean")),
                 dependent = "outcome", unit.variable = "unit_id",
                 time.variable = "year", treatment.identifier = u,
                 controls.identifier = controls,
                 time.optimize.ssr = 1980:1999, time.plot = 1980:2010)
  so <- synth(dp, Sigf.ipop = 3)
  placebo_gaps[[as.character(u)]] <- dp$Y1plot - (dp$Y0plot %*% so$solution.w)
}

# Plot all gaps; treated unit should stand out
plot(1980:2010, placebo_gaps[["1"]], type = "l", lwd = 2, col = "black",
     ylim = range(unlist(placebo_gaps)), ylab = "Gap", xlab = "Year")
for (u in names(placebo_gaps)[-1]) {
  lines(1980:2010, placebo_gaps[[u]], col = "grey70")
}
abline(v = 2000, lty = 2); abline(h = 0, lty = 3)
legend("topleft", c("Treated", "Placebos"), col = c("black","grey70"), lwd = c(2,1))

* Stata — in-space placebo with synth_runner
synth_runner outcome gdp_pc trade_share inflation ///
    outcome(1995) outcome(1990) outcome(1985), ///
    trunit(1) trperiod(2000) gen_vars
effect_graphs
single_treatment_graphs

# Rank by post/pre MSPE ratio
mspe_ratios <- sapply(names(placebo_gaps), function(u) {
  gap <- placebo_gaps[[u]]
  pre  <- gap[1:20]   # pre-treatment periods
  post <- gap[21:31]  # post-treatment periods
  sum(post^2) / sum(pre^2)
})

# p-value: fraction of placebos with ratio ≥ treated
p_value <- mean(mspe_ratios >= mspe_ratios["1"])
cat("MSPE ratio rank p-value:", p_value, "\n")
# p < 0.05 → significant effect

# Use earlier fake treatment date
dp_placebo <- dataprep(foo = df, ...,
                       time.optimize.ssr = 1980:1989,  # shorter pre-period
                       time.plot = 1980:1999)           # only pre-treatment
so_placebo <- synth(dp_placebo)
# Gap should be ≈ 0 if model is well-specified

Guideline	Rationale
Exclude units affected by similar treatment	Avoids contamination
Include only structurally similar units	Improves fit quality
Use pre-treatment outcome lags as predictors	Most powerful predictors
Drop donors with zero weight and large pre-MSPE	Focus on contributing donors
Leave-one-out: iteratively drop each donor	Check weight sensitivity

# R — predictor balance check after synth()
synth_tab <- synth.tab(synth_out, dataprep_out)

# tab.pred: treated, synthetic control, and sample average for each predictor
print(synth_tab$tab.pred)
# Look for rows where treated and synthetic values are close (small gap)
# Large gaps in key predictors signal poor donor pool or predictor choice

# Compute RMSPE on predictor balance:
pred_balance <- synth_tab$tab.pred[, 1:2]  # treated vs synthetic
pred_gaps <- pred_balance[, 1] - pred_balance[, 2]
cat("Predictor balance (treated - synthetic):\n")
print(round(pred_gaps, 4))

# Python — predictor balance after SparseSC or manual SCM
import numpy as np
import pandas as pd

# Treated unit predictor values (pre-treatment means)
treated_pred  = X_treated.mean(axis=0)
synthetic_pred = (weights @ X_donors).flatten()  # weights × donor covariates

balance_df = pd.DataFrame({
    'Predictor':  predictor_names,
    'Treated':    treated_pred,
    'Synthetic':  synthetic_pred,
    'Gap':        treated_pred - synthetic_pred,
    'Pct_Gap':    100 * (treated_pred - synthetic_pred) / np.abs(treated_pred)
})
print(balance_df.to_string(index=False))
# Flag predictors where |Pct_Gap| > 5% — consider adjusting donor pool

* Stata — predictor balance displayed automatically after synth
synth outcome gdp_pc trade_share inflation ///
    outcome(1995) outcome(1990) outcome(1985), ///
    trunit(1) trperiod(2000)
* The output table "Predictor Balance" compares treated vs synthetic vs sample avg
* Verify that treated and synthetic columns are close for key predictors

# R — Synthetic DID
library(synthdid)

# Data must be a balanced panel in matrix form
setup <- panel.matrices(df, unit = "unit_id", time = "year",
                        outcome = "outcome", treatment = "treated")
sdid <- synthdid_estimate(setup$Y, setup$N0, setup$T0)
se <- sqrt(vcov(sdid, method = "placebo"))
cat("SDID estimate:", sdid, "SE:", se, "\n")
plot(sdid)

# Python — synthdid
# pip install synthdid
from synthdid.model import SynthDID
model = SynthDID(df, unit='unit_id', time='year',
                 outcome='outcome', treatment='treated')
model.fit()
print(f"SDID ATT: {model.att():.4f}")
model.plot()

# R — compute and assess pre-treatment RMSPE
# After synth() and dataprep():
gaps <- dataprep_out$Y1plot - (dataprep_out$Y0plot %*% synth_out$solution.w)

pre_periods  <- which(as.numeric(rownames(gaps)) < treatment_year)
post_periods <- which(as.numeric(rownames(gaps)) >= treatment_year)

pre_rmspe  <- sqrt(mean(gaps[pre_periods]^2))
post_rmspe <- sqrt(mean(gaps[post_periods]^2))
outcome_mean <- mean(dataprep_out$Y1plot[pre_periods])

cat(sprintf("Pre-treatment RMSPE: %.4f (%.1f%% of outcome mean)\n",
            pre_rmspe, 100 * pre_rmspe / outcome_mean))
cat(sprintf("Post-treatment RMSPE: %.4f\n", post_rmspe))
cat(sprintf("Post/Pre MSPE ratio: %.2f\n", (post_rmspe/pre_rmspe)^2))

# Rule of thumb: if pre-RMSPE > 5% of outcome mean, reconsider the design
if (pre_rmspe / outcome_mean > 0.05) {
  warning("Pre-treatment RMSPE exceeds 5% of outcome mean — fit may be poor.")
}

# Python — pre-treatment fit visualization
import matplotlib.pyplot as plt
import numpy as np

years = np.array(all_years)
treated  = Y_actual          # actual treated unit outcomes
synthetic = Y_synthetic      # synthetic control outcomes
treatment_year = 2000

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Panel A: level plot
axes[0].plot(years, treated,   'k-',  linewidth=2,   label='Treated')
axes[0].plot(years, synthetic, 'r--', linewidth=1.5, label='Synthetic Control')
axes[0].axvline(treatment_year, color='gray', linestyle=':', linewidth=1)
axes[0].set_title('Treated vs Synthetic Control'); axes[0].legend()

# Panel B: gap plot with pre-period RMSPE annotation
gap = treated - synthetic
axes[1].plot(years, gap, 'b-', linewidth=2, label='Gap (τ̂)')
axes[1].axvline(treatment_year, color='gray', linestyle=':', linewidth=1)
axes[1].axhline(0, color='black', linestyle='-', linewidth=0.5)

pre_mask = years < treatment_year
pre_rmspe = np.sqrt(np.mean(gap[pre_mask]**2))
axes[1].set_title(f'Treatment Effect (Pre-RMSPE = {pre_rmspe:.3f})')
axes[1].legend()

plt.tight_layout()
plt.savefig('synthetic_control_fit.pdf', bbox_inches='tight')

* Stata — pre-treatment fit assessment
* After synth estimation, compute RMSPE manually:
* (synth stores results; use stored matrices)
matrix gaps = e(Y_treated) - e(Y_synthetic)

* Pre-treatment RMSPE:
* (Stata code depends on synth version; synth_runner automates this)
synth_runner outcome gdp_pc trade_share outcome(1995) outcome(1990), ///
    trunit(1) trperiod(2000) gen_vars
* Outputs pre_rmspe and post_rmspe in stored results
pval2 using synth_results   // rank-based p-value from synth_runner