Statistical hypothesis testing framework including null and alternative hypotheses, error types, p-value interpretation, confidence intervals, parametric and non-parametric test selection, and one-sided vs two-sided testing.
The null hypothesis states there is no effect, no difference, or no association. It is the default assumption that the treatment or exposure has no impact.
The null hypothesis is never "proven" — it is either rejected or not rejected. "Failure to reject" is not the same as "accepting" H0.
The alternative hypothesis states that an effect exists. It is what the researcher seeks evidence for.
The alternative hypothesis should be specified before data collection.
Correctly rejecting H0 but concluding the effect is in the wrong direction. Relevant primarily in one-sided testing contexts.
The p-value is the probability of observing data at least as extreme as the actual data, assuming the null hypothesis is true.
P(data this extreme or more | H0 is true)
Six principles:
A 95% confidence interval means: if we repeated the study infinitely and calculated a CI each time, 95% of those intervals would contain the true parameter value.
It does NOT mean: there is a 95% probability that the true value lies within this specific interval (the true value either is or is not in the interval; we do not know which).
Step 1: What is the research question?
Step 2: How many groups?
Step 3: What type of data?
Step 4: Are assumptions met?
| Scenario | Test |
|---|---|
| One sample vs known value | One-sample t-test |
| Two independent groups, continuous | Independent samples t-test (Welch's preferred) |
| Two paired/matched groups | Paired t-test |
| >2 independent groups | One-way ANOVA |
| >2 related groups | Repeated measures ANOVA |
| Two factors | Two-way ANOVA |
| Continuous predictor/outcome | Pearson correlation, linear regression |
| Scenario | Test |
|---|---|
| One sample vs known value | Wilcoxon signed-rank |
| Two independent groups | Mann-Whitney U (Wilcoxon rank-sum) |
| Two paired groups | Wilcoxon signed-rank |
| >2 independent groups | Kruskal-Wallis |
| >2 related groups | Friedman test |
| Correlation | Spearman rank correlation |
| Two categorical variables | Chi-square, Fisher's exact |
| Ordered categorical outcome | Cochran-Armitage trend test |
Tests for a difference in either direction. Used when an effect in either direction is possible and scientifically relevant.
Tests for a difference in one direction only.
Using one-sided tests to convert a "non-significant" two-sided p-value (e.g., 0.07) into a "significant" one-sided result (0.035). This is alpha manipulation and scientifically dishonest if not pre-specified.
When multiple hypotheses are tested simultaneously, the probability of at least one Type I error exceeds alpha. With k independent tests at alpha = 0.05:
P(at least one false positive) = 1 - (1 - 0.05)^k
For 20 tests: P = 1 - 0.95^20 = 0.64 (64% chance of at least one false positive).
See the dedicated multiple-comparisons skill for correction methods.