Use when performing statistical modeling, hypothesis testing, or model diagnostics in R. Provides expert guidance on linear models, GLMs, mixed models, survival analysis, Bayesian methods, time series, model comparison, assumption checking, and effect-size reporting. Triggers: statistical model, hypothesis test, regression, ANOVA, t-test, chi-squared, lm, glm, mixed model, survival analysis, p-value, confidence interval, diagnostics, significantly different, statistical test, odds ratio, effect size, model assumptions, Cox model. Do NOT use for machine learning or predictive modeling — use r-tidymodels instead. Do NOT use for clinical trial-specific analysis — use r-clinical instead.
Statistical modeling, hypothesis testing, and diagnostics in R.
All code uses base pipe |>, <- for assignment, and tidyverse style.
Lazy references:
references/model-selection-guide.md for outcome-type decision trees and tidymodels workflowreferences/assumption-checklist.md for per-family assumption verification with R codeAgent dispatch: For methodology questions, assumption audits, or interpreting complex outputs, dispatch to the r-statistician agent.
MCP integration (when R session available):
btw_tool_sessioninfo_is_package_installed to verify availabilitybtw_tool_docs_help_page to read the actual function signatures and argumentsbtw_tool_env_describe_data_frame to check outcome variable type and sample sizeAlways use for tidy output, for model-level stats, for per-row fitted values.
broom::tidy(fit, conf.int = TRUE)broom::glance()broom::augment()Assumption workflow (required before interpreting results):
par(mfrow = c(2, 2)); plot(fit) — residuals vs fitted, Q-Q, scale-location, leveragecar::vif(fit) — VIF > 5 is concern, > 10 is severe multicollinearitycar::influencePlot(fit) — identify high-leverage outliersRead references/assumption-checklist.md for the full per-family checklist.
| Outcome | Family | Key detail |
|---|---|---|
| Binary | binomial(link = "logit") | Always specify family =; default is gaussian |
| Count | poisson | Check overdispersion: deviance / df.residual >> 1 -> use quasipoisson or MASS::glm.nb() |
| Count (overdispersed) | MASS::glm.nb() or quasipoisson | NB adds a dispersion parameter |
Use exponentiate = TRUE in broom::tidy() for odds ratios (logistic) or rate ratios (Poisson). Use offset(log(exposure)) for rate models.
Use performance::icc() to check if random effects are warranted. Convergence failures: try lme4::allFit() to compare optimizers, then simplify random effects structure if needed.
Boundary: General survival methodology (Cox, KM, time-varying covariates). For clinical trial endpoints (OS, PFS, DFS), use r-clinical instead.
library(survival)
# Kaplan-Meier
km <- survfit(Surv(time, event) ~ group, data = df)
summary(km)