statsmodels Skill

statsmodels general-purpose statistical modeling library for Python. Covers OLS/WLS/GLS, GLM (logit, probit, Poisson, negative binomial), discrete choice models, time series (ARIMA, SARIMAX, VAR), mixed effects (MixedLM), robust regression, hypothesis tests, and comprehensive diagnostics. Supports R-style formula API. Use when fitting regressions without fixed effects, running GLMs or logit/probit, analyzing time series, or using formula syntax. For fixed effects or DiD, use pyfixest; for panel/IV/system models, use linearmodels.

Comprehensive skill for statistical modeling with statsmodels. Use decision trees below to find the right guidance, then load detailed references.

What is statsmodels?

statsmodels is the general-purpose statistical modeling library for Python:

Two APIs: Formula API (smf.ols("y ~ x1 + x2", data=df)) for R-style modeling, and array API (sm.OLS(y, X)) for programmatic control
Broad model coverage: OLS, WLS, GLS, GLM (all families), logit, probit, multinomial, count models, zero-inflated models, quantile regression, robust regression

statsmodels Skill

Comprehensive skill for statistical modeling with statsmodels. Use decision trees below to find the right guidance, then load detailed references.

What is statsmodels?

statsmodels is the general-purpose statistical modeling library for Python:

Two APIs: Formula API (smf.ols("y ~ x1 + x2", data=df)) for R-style modeling, and array API (sm.OLS(y, X)) for programmatic control
Broad model coverage: OLS, WLS, GLS, GLM (all families), logit, probit, multinomial, count models, zero-inflated models, quantile regression, robust regression

Topic

Reference File

Installation

./references/quickstart.md

Formula vs array API

./references/quickstart.md

Reading summary output

./references/quickstart.md

Comparison to pyfixest

./references/quickstart.md

OLS regression

./references/linear-models.md

Weighted least squares

./references/linear-models.md

GLS

./references/linear-models.md

Robust regression (RLM)

./references/linear-models.md

Quantile regression

./references/linear-models.md

Interactions and polynomials

./references/linear-models.md

GLM framework

./references/glm-discrete.md

Logit / probit

./references/glm-discrete.md

Multinomial logit

./references/glm-discrete.md

Poisson / negative binomial

./references/glm-discrete.md

Zero-inflated models

./references/glm-discrete.md

Marginal effects

./references/glm-discrete.md

Exposure / offset

./references/glm-discrete.md

ARIMA / SARIMAX

./references/time-series.md

VAR / VECM

./references/time-series.md

Exponential smoothing

./references/time-series.md

Unit root tests

./references/time-series.md

ACF / PACF

./references/time-series.md

Forecasting

./references/time-series.md

State space models

./references/time-series.md

Heteroskedasticity tests

./references/diagnostics.md

Normality tests

./references/diagnostics.md

Specification tests (RESET)

./references/diagnostics.md

VIF / multicollinearity

./references/diagnostics.md

Influence measures

./references/diagnostics.md

Residual analysis

./references/diagnostics.md

Durbin-Watson

./references/diagnostics.md

t-tests and F-tests

./references/hypothesis-testing.md

Wald tests

./references/hypothesis-testing.md

Likelihood ratio tests

./references/hypothesis-testing.md

Multiple comparison corrections

./references/hypothesis-testing.md

Comparing nested models

./references/hypothesis-testing.md

Serial correlation tests

./references/diagnostics.md

Diagnostic checklist

./references/diagnostics.md

Chi-squared tests

./references/hypothesis-testing.md

Joint significance tests

./references/hypothesis-testing.md

Ordered logit / probit

./references/glm-discrete.md

Mixed effects (MixedLM)

./references/linear-models.md

Constant term pitfall

./references/gotchas.md

Convergence warnings

./references/gotchas.md

predict() issues

./references/gotchas.md

Formula parsing (patsy)

./references/gotchas.md

summary() vs summary2()

./references/gotchas.md

NaN / missing data

./references/gotchas.md

DataFrame index issues

./references/gotchas.md

statsmodels vs pyfixest

./references/gotchas.md

File	Purpose	When to Read
`quickstart.md`	Installation, formula vs array API, first model	Starting with statsmodels
`linear-models.md`	OLS, WLS, GLS, robust regression, quantile regression	Fitting linear models
`glm-discrete.md`	GLM families, logit/probit, count models, zero-inflated	Non-linear models, binary/count outcomes
`time-series.md`	ARIMA, SARIMAX, VAR, exponential smoothing, unit root tests	Analyzing temporal data
`diagnostics.md`	Heteroskedasticity, normality, VIF, influence, residuals	Checking model assumptions
`hypothesis-testing.md`	t-tests, F-tests, Wald tests, multiple comparisons	Testing coefficients and comparing models
`gotchas.md`	Constant term, convergence, predict pitfalls, pyfixest boundary	Debugging issues

Operation	Code
OLS (formula)	`smf.ols("y ~ x1 + x2", data=df).fit()`
OLS (array)	`sm.OLS(y, sm.add_constant(X)).fit()`
Logit	`smf.logit("y ~ x1 + x2", data=df).fit()`
Probit	`smf.probit("y ~ x1 + x2", data=df).fit()`
Poisson	`smf.poisson("y ~ x1 + x2", data=df).fit()`
GLM (custom)	`smf.glm("y ~ x1", data=df, family=sm.families.Binomial()).fit()`
WLS	`smf.wls("y ~ x1", data=df, weights=w).fit()`
Robust (HC1)	`fit = smf.ols(...).fit(cov_type='HC1')`
ARIMA	`sm.tsa.ARIMA(y, order=(p,d,q)).fit()`
Summary	`results.summary()`
Predict	`results.predict(new_data)`
Confidence intervals	`results.conf_int(alpha=0.05)`
Marginal effects	`results.get_margeff(at='overall')`
VIF	`from statsmodels.stats.outliers_influence import variance_inflation_factor`
Breusch-Pagan	`sm.stats.diagnostic.het_breuschpagan(resid, exog)`

Statsmodels

statsmodels Skill

What is statsmodels?

Statsmodels

statsmodels Skill

What is statsmodels?

How to Use This Skill

Reference File Structure

Reading Order

Quick Decision Trees

"I need to fit a regression model"

"I need to analyze time series"

"I need to check model assumptions"

"I need to test hypotheses"

"Something isn't working"

File-First Execution in Research Workflows

Quick Reference

Essential Imports

Core Operations

Formula Syntax

Topic Index

Citation

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns

Statsmodels

statsmodels Skill

What is statsmodels?

Statsmodels

statsmodels Skill

What is statsmodels?

How to Use This Skill

Reference File Structure

Reading Order

Related Skills

Quick Decision Trees

"I need to fit a regression model"

"I need to analyze time series"

"I need to check model assumptions"

"I need to test hypotheses"

"Something isn't working"

File-First Execution in Research Workflows

Quick Reference

Essential Imports

Core Operations

Formula Syntax

Topic Index

Citation

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns