Comparative Politics — Cross-National Panel Analysis

When to Use

Use this skill when you need to:

Download and merge the Quality of Government (QoG) standard dataset, Polity 5 scores, and Freedom House annual ratings into a unified country-year panel
Classify countries into regime types (autocracy, hybrid/anocracy, democracy) using Polity 5 thresholds, and track transitions over time
Run panel OLS with country and year fixed effects using the linearmodels package (handles the Frisch-Waugh-Lovell within transformation efficiently)
Perform Hausman test to choose between fixed effects and random effects models
Detect democratic backsliding as a rolling 3-year change in Polity score
Produce Lipset-style scatter plots (GDP per capita vs. democracy score) with regional coloring
Estimate multilevel models where countries are nested within regions

This skill covers exploratory analysis, static regression, and dynamic models. For time-series specific methods (cointegration, error correction), use a dedicated time-series skill.

Background

Comparative Politics — Cross-National Panel Analysis

When to Use

Use this skill when you need to:

Download and merge the Quality of Government (QoG) standard dataset, Polity 5 scores, and Freedom House annual ratings into a unified country-year panel
Classify countries into regime types (autocracy, hybrid/anocracy, democracy) using Polity 5 thresholds, and track transitions over time
Run panel OLS with country and year fixed effects using the linearmodels package (handles the Frisch-Waugh-Lovell within transformation efficiently)
Perform Hausman test to choose between fixed effects and random effects models
Detect democratic backsliding as a rolling 3-year change in Polity score
Produce Lipset-style scatter plots (GDP per capita vs. democracy score) with regional coloring
Estimate multilevel models where countries are nested within regions

This skill covers exploratory analysis, static regression, and dynamic models. For time-series specific methods (cointegration, error correction), use a dedicated time-series skill.

Background

import os import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.cm as cm from scipy import stats import statsmodels.api as sm import warnings warnings.filterwarnings("ignore") # --------------------------------------------------------------------------- # 1. Load and Harmonize QoG Data # --------------------------------------------------------------------------- QOG_KEY_VARS = [ "cname", "ccodealp", "year", "wbgi_cce", "wbgi_rle", "wbgi_gee", "wdi_gdpcapcon2015", "wdi_pop", "undp_hdi", ] def load_qog(path: str, variables: list[str] | None = None, years: tuple[int, int] | None = None) -> pd.DataFrame: """ Load the QoG time-series dataset. Parameters ---------- path : str Path to QoG standard time-series CSV. variables : list of str, optional Variables to retain; always includes cname, ccodealp, year. years : tuple, optional Year range filter. Returns ------- pd.DataFrame standardized with columns: country, iso3, year, and requested variables. """ header = pd.read_csv(path, nrows=0) keep = list(set(["cname", "ccodealp", "year"] + (variables or QOG_KEY_VARS))) use = [c for c in keep if c in header.columns] df = pd.read_csv(path, usecols=use, low_memory=False) if years: df = df[(df["year"] >= years[0]) & (df["year"] <= years[1])] df = df.rename(columns={"cname": "country", "ccodealp": "iso3"}) return df.sort_values(["country", "year"]).reset_index(drop=True) # --------------------------------------------------------------------------- # 2. Load Polity 5 and Classify Regime Types # --------------------------------------------------------------------------- POLITY_SPECIAL_CODES = {-66, -77, -88} def load_polity5(path: str) -> pd.DataFrame: """ Load Polity 5 dataset from Excel file. Returns ------- pd.DataFrame with: iso3, country, year, polity2, regime_type. """ df = pd.read_excel(path, sheet_name=0) # Standardize column names (different versions use different names) df.columns = [c.lower().strip() for c in df.columns] rename_map = {} for col in df.columns: if "scode" in col or col == "country": rename_map[col] = "country" if "country" in col else "iso_scode" if col in ("polity2", "polity"): rename_map[col] = "polity2" df = df.rename(columns=rename_map) # Keep key columns keep = [c for c in ["scode", "country", "year", "polity2", "democ", "autoc"] if c in df.columns] df = df[keep].copy() # Replace special Polity codes with NaN if "polity2" in df.columns: df["polity2"] = pd.to_numeric(df["polity2"], errors="coerce") df.loc[df["polity2"].isin(POLITY_SPECIAL_CODES), "polity2"] = np.nan df["regime_type"] = pd.cut( df["polity2"], bins=[-11, -6, 5, 10], labels=["Autocracy", "Hybrid/Anocracy", "Democracy"], ) return df def classify_regime(polity_score: float) -> str: """Classify a single Polity 2 score into regime type.""" if pd.isna(polity_score): return "Unknown" if polity_score <= -6: return "Autocracy" if polity_score >= 6: return "Democracy" return "Hybrid/Anocracy" # --------------------------------------------------------------------------- # 3. Load Freedom House # --------------------------------------------------------------------------- def load_freedom_house(path: str) -> pd.DataFrame: """ Load Freedom House FIW data from Excel. Returns ------- pd.DataFrame with: country, year, fh_pr, fh_cl, fh_total, fh_status. """ # FH Excel typically has multiple sheets; try the main country scores sheet try: df = pd.read_excel(path, sheet_name="FIW06-23", header=1) except Exception: df = pd.read_excel(path, header=0) df.columns = [str(c).strip() for c in df.columns] # Typical columns: Country/Territory, C/T, Edition, Status, PR, CL, Total rename = { "Country/Territory": "country", "Edition": "year", "Status": "fh_status", "PR": "fh_pr", "CL": "fh_cl", "Total": "fh_total", } df = df.rename(columns={k: v for k, v in rename.items() if k in df.columns}) keep = [c for c in ["country", "year", "fh_pr", "fh_cl", "fh_total", "fh_status"] if c in df.columns] return df[keep].dropna(subset=["country", "year"]).copy() # --------------------------------------------------------------------------- # 4. Merge Panel Datasets # --------------------------------------------------------------------------- def build_comparative_panel( qog_df: pd.DataFrame, polity_df: pd.DataFrame, fh_df: pd.DataFrame | None = None, ) -> pd.DataFrame: """ Merge QoG, Polity 5, and (optionally) Freedom House into a country-year panel. Merge key: country name + year (fuzzy ISO3 matching as fallback). Returns ------- pd.DataFrame — merged country-year panel. """ panel = qog_df.copy() # Merge Polity 5 if "country" in polity_df.columns and "year" in polity_df.columns: panel = panel.merge( polity_df[["country", "year", "polity2", "regime_type"]], on=["country", "year"], how="left", ) # Merge Freedom House if fh_df is not None and "country" in fh_df.columns: fh_keep = [c for c in ["country", "year", "fh_pr", "fh_cl", "fh_total", "fh_status"] if c in fh_df.columns] panel = panel.merge(fh_df[fh_keep], on=["country", "year"], how="left") # Log GDP if "wdi_gdpcapcon2015" in panel.columns: panel["log_gdppc"] = np.log(panel["wdi_gdpcapcon2015"].clip(lower=1)) if "wdi_pop" in panel.columns: panel["log_pop"] = np.log(panel["wdi_pop"].clip(lower=1)) return panel.sort_values(["country", "year"]).reset_index(drop=True) # --------------------------------------------------------------------------- # 5. Panel Regression with Country + Year FE # --------------------------------------------------------------------------- def panel_fe_regression( panel: pd.DataFrame, outcome: str, predictors: list[str], country_col: str = "country", year_col: str = "year", entity_effects: bool = True, time_effects: bool = True, ) -> object: """ Run panel OLS with optional entity and time fixed effects (linearmodels). Parameters ---------- panel : pd.DataFrame outcome : str Dependent variable. predictors : list of str entity_effects : bool Country fixed effects (within-estimator). time_effects : bool Year fixed effects. Returns ------- linearmodels PanelResults object. """ from linearmodels.panel import PanelOLS, RandomEffects sub = panel[[country_col, year_col, outcome] + predictors].dropna().copy() sub = sub.set_index([country_col, year_col]) endog = sub[outcome] exog = sm.add_constant(sub[predictors]) model = PanelOLS( endog, exog, entity_effects=entity_effects, time_effects=time_effects, ) return model.fit(cov_type="clustered", cluster_entity=True) # --------------------------------------------------------------------------- # 6. Democratic Backsliding Detection # --------------------------------------------------------------------------- def detect_backsliding( panel: pd.DataFrame, polity_col: str = "polity2", window: int = 3, threshold: float = 3.0, country_col: str = "country", year_col: str = "year", ) -> pd.DataFrame: """ Flag country-years where the rolling Polity 2 score declined by threshold points over the preceding window years, starting from a democratic baseline (≥+6). Parameters ---------- panel : pd.DataFrame polity_col : str window : int Rolling lookback in years. threshold : float Minimum absolute decline to flag. country_col : str year_col : str Returns ------- pd.DataFrame of backsliding episodes, sorted by severity. """ df = panel.sort_values([country_col, year_col]).copy() df["polity_lag"] = df.groupby(country_col)[polity_col].shift(window) df["polity_change"] = df[polity_col] - df["polity_lag"] episodes = df[ (df["polity_change"] <= -threshold) & (df["polity_lag"] >= 6) ].copy() episodes["severity"] = episodes["polity_change"].abs() return ( episodes[[country_col, year_col, polity_col, "polity_lag", "polity_change", "severity"]] .sort_values("severity", ascending=False) .reset_index(drop=True) )

import numpy as np import pandas as pd import matplotlib.pyplot as plt # Detect backsliding in the simulated panel episodes = detect_backsliding( panel, polity_col="polity2", window=3, threshold=3.0, ) print(f"=== Democratic Backsliding Episodes (n={len(episodes)}) ===") print(episodes.head(10).to_string(index=False)) # Rolling 3-year polity change (global average) panel_sorted = panel.sort_values(["country", "year"]) panel_sorted["polity_lag3"] = panel_sorted.groupby("country")["polity2"].shift(3) panel_sorted["polity_delta3"] = panel_sorted["polity2"] - panel_sorted["polity_lag3"] global_delta = panel_sorted.groupby("year")["polity_delta3"].mean() fig, ax = plt.subplots(figsize=(11, 5)) ax.bar(global_delta.index, global_delta.values, color=["#d73027" if v < 0 else "#4575b4" for v in global_delta.values], alpha=0.8) ax.axhline(0, color="black", linewidth=0.8) ax.set_title("Mean 3-Year Change in Polity Score (Global Average)") ax.set_xlabel("Year") ax.set_ylabel("Mean Polity Change (3-year)") ax.grid(True, axis="y", alpha=0.3) plt.tight_layout() plt.savefig("polity_backsliding_trend.png", dpi=150) plt.show() # Lipset scatter: GDP vs Polity latest = panel[panel["year"] == panel["year"].max()].dropna(subset=["log_gdppc", "polity2"]) fig2, ax2 = plt.subplots(figsize=(9, 6)) scatter = ax2.scatter(latest["log_gdppc"], latest["polity2"], alpha=0.7, c=pd.Categorical(latest["regime_type"]).codes, cmap="RdYlGn", edgecolors="white", s=60, linewidths=0.4) slope, intercept, r, p, se = stats.linregress(latest["log_gdppc"], latest["polity2"]) x_line = np.linspace(latest["log_gdppc"].min(), latest["log_gdppc"].max(), 100) ax2.plot(x_line, intercept + slope * x_line, "r--", linewidth=1.5, label=f"OLS (r={r:.2f})") ax2.set_xlabel("log(GDP per capita, constant 2015 USD)") ax2.set_ylabel("Polity 2 Score") ax2.set_title("Lipset's Modernization Hypothesis: GDP vs. Democracy") ax2.axhline(6, color="gray", linestyle=":", linewidth=1, label="Democracy threshold (+6)") ax2.axhline(-6, color="gray", linestyle=":", linewidth=1, label="Autocracy threshold (-6)") ax2.legend(fontsize=8) ax2.grid(True, alpha=0.3) plt.tight_layout() plt.savefig("lipset_scatter.png", dpi=150) plt.show() from scipy import stats

QoG Variable	Description	Source
`wbgi_cce`	Control of Corruption Estimate	World Bank
`wbgi_rle`	Rule of Law Estimate	World Bank
`wbgi_gee`	Government Effectiveness Estimate	World Bank
`undp_hdi`	Human Development Index	UNDP
`wdi_gdpcapcon2015`	GDP per capita (constant 2015 USD)	World Bank

Problem	Cause	Solution
Large number of NaN after merge	Country name inconsistencies (e.g., "Czech Republic" vs "Czechia")	Harmonize using ISO3 codes; use `pycountry` for name normalization
Polity 5 shows -66/-77/-88	Special codes for interruption/interregnum	Replace with `NaN` using `POLITY_SPECIAL_CODES` set
PanelOLS absorbed all variation	FE absorbs time-invariant outcome	Use within-country time variation; check that outcome changes over time
Freedom House file structure changed	Annual report format varies	Inspect the Excel sheet names and header rows before loading
`linearmodels` Hausman test	Package does not provide built-in Hausman	Use the manual `hausman_test` function above

Comparative Politics

Comparative Politics — Cross-National Panel Analysis

When to Use

Background

Comparative Politics

Comparative Politics — Cross-National Panel Analysis

When to Use

Background

Environment Setup

Core Workflow

Advanced Usage

Hausman Test for FE vs. RE

Troubleshooting

External Resources

Examples

Example 1: Merge QoG + Polity + FH and Classify Regimes

Example 2: Within-Country FE Regression — Corruption and GDP

Example 3: Democratic Backsliding Detection and Trend Plot

Llm Trading Agent Security

Energy Procurement

Council

Carrier Relationship Management

Market Research

Market Research