Name: Statistical Inference
Author: kantundpeterpan

搜索技能.../

Statistical Inference | Skills Pool

import statsmodels.formula.api as smf
import pandas as pd

# OLS
model = smf.ols("outcome ~ predictor1 + C(group)", data=df).fit()

# Logistic
model = smf.logit("binary_outcome ~ x1 + x2", data=df).fit()

# Poisson
model = smf.poisson("count ~ x1 + x2", data=df).fit()

# Mixed-effects (LME4-style)
import statsmodels.formula.api as smf
model = smf.mixedlm("y ~ x1 + x2", data=df, groups=df["subject_id"]).fit()

print(model.summary())

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

residuals = model.resid
fitted = model.fittedvalues

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Residuals vs Fitted
axes[0,0].scatter(fitted, residuals, alpha=0.4)
axes[0,0].axhline(0, color='red', lw=1)
axes[0,0].set_xlabel("Fitted"); axes[0,0].set_ylabel("Residuals")
axes[0,0].set_title("Residuals vs Fitted")

# Q-Q plot
stats.probplot(residuals, plot=axes[0,1])
axes[0,1].set_title("Normal Q-Q")

# Scale-Location
axes[1,0].scatter(fitted, np.sqrt(np.abs(residuals)), alpha=0.4)
axes[1,0].set_xlabel("Fitted"); axes[1,0].set_ylabel("√|Residuals|")
axes[1,0].set_title("Scale-Location")

# Cook's distance
influence = model.get_influence()
(c, p) = influence.cooks_distance
axes[1,1].stem(range(len(c)), c, markerfmt=",", basefmt="b-")
axes[1,1].axhline(4/len(df), color='red', lw=1, label=f"4/n={4/len(df):.4f}")
axes[1,1].set_title("Cook's Distance"); axes[1,1].legend()

fig.tight_layout()
fig.savefig("figures/regression_diagnostics.png", dpi=150)

from statsmodels.stats.outliers_influence import variance_inflation_factor

X = model.model.exog
vif = pd.DataFrame({
    "feature": model.model.exog_names,
    "VIF": [variance_inflation_factor(X, i) for i in range(X.shape[1])]
})
print(vif[vif["VIF"] > 5])  # flag if VIF > 5

A multiple linear regression was conducted to predict {outcome} from
{predictors}. The model explained a significant proportion of variance,
R² = .34, F(3, 196) = 33.5, p < .001.

Predictor1 was a significant positive predictor, β = 0.42, 95% CI [0.28, 0.56],
t(196) = 5.91, p < .001.

Predictor2 was not significant, β = 0.08, 95% CI [−0.04, 0.21],
t(196) = 1.30, p = .196.

ci = model.conf_int(alpha=0.05)
ci.columns = ["Lower 95%", "Upper 95%"]
results = pd.concat([model.params, model.pvalues, ci], axis=1)
results.columns = ["Estimate", "p-value", "Lower 95%", "Upper 95%"]
print(results.round(4))

Statistical Inference

When to Use

Modelling Workflow

1. Specify the Model

Statistical Inference

When to Use

Modelling Workflow

1. Specify the Model

2. Fit the Model

3. Check Assumptions

4. Check Multicollinearity

5. Report Results (APA Format)

6. Confidence Intervals

Review Checklist

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns