Epidemiological analysis conventions for R. Apply when writing or reviewing statistical analysis code involving: regression models for count or rate data, forecast evaluation, scoring rules, GAMs/GAMMs, or time-series of infectious disease data. Covers model specification, diagnostics, and interpretation.
wis + 1e-7)lubridate; align weekly aggregation with ISO epiweeksChoose model family based on the outcome distribution:
| Outcome | First choice | Check | Alternative |
|---|---|---|---|
| Counts | Poisson GLM | Overdispersion (dispersiontest()) |
| Negative binomial, quasi-Poisson |
| Rates | Poisson with offset | Overdispersion | Negative binomial with offset |
| Continuous positive | Gamma GLM or log-transformed Gaussian | Residual normality | Tweedie |
| Scores (WIS, CRPS) | Log-transformed Gaussian | Residual patterns | Gamma GLM |
mgcv)s()) for non-linear relationships; parametric terms for categorical variablesbs = "re" for random intercepts; bs = "fs" for factor-smooth interactionsbs = "tp" (default thin plate) for spatial/temporal smoothsbam() with discrete = TRUE for large datasets; fREML for fast estimationmgcv::k.check()Run and report these diagnostics:
gratia::appraise() for GAM diagnostic plotsmgcv::k.check() for basis dimension adequacygammit::extract_ranef() to inspect estimated effects and CIsReport what diagnostics show, including imperfections. Do not hide diagnostic issues.
scoringutils pipeline: as_forecast_quantile() then score()|>)here() for all file paths