Overview

Hypothesis testing is the backbone of empirical research. It provides a principled framework for deciding whether observed differences in data reflect genuine effects or merely random variation. Misuse of hypothesis tests -- p-hacking, ignoring assumptions, confusing statistical and practical significance -- is a leading cause of irreproducible findings.

This guide covers the core hypothesis testing framework, the most commonly used tests across disciplines, assumption checking, effect size reporting, power analysis for sample size planning, and multiple comparison corrections. Each test is accompanied by Python code using scipy, statsmodels, and pingouin, ready to integrate into research workflows.

The goal is not just to help you run tests, but to help you run the right test correctly and report results following modern standards (APA 7th edition, journal best practices).

The Hypothesis Testing Framework

Step-by-Step Procedure

State hypotheses. Define H0 (null: no effect) and H1 (alternative: effect exists).
Typically alpha = 0.05, but justify your choice.

Overview

The goal is not just to help you run tests, but to help you run the right test correctly and report results following modern standards (APA 7th edition, journal best practices).

The Hypothesis Testing Framework

Step-by-Step Procedure

State hypotheses. Define H0 (null: no effect) and H1 (alternative: effect exists).
Typically alpha = 0.05, but justify your choice.

Error Type	Definition	Probability
Type I (False Positive)	Reject H0 when it is true	alpha (usually 0.05)
Type II (False Negative)	Fail to reject H0 when it is false	beta (usually 0.20)
Power	Probability of correctly detecting an effect	1 - beta (target: 0.80)

Research Question	Data Type	Groups	Test
Two group means differ?	Continuous, normal	2 independent	Independent t-test
Before/after difference?	Continuous, normal	2 paired	Paired t-test
Multiple group means differ?	Continuous, normal	3+ independent	One-way ANOVA
Two group medians differ?	Ordinal / non-normal	2 independent	Mann-Whitney U
Before/after (non-normal)?	Ordinal / non-normal	2 paired	Wilcoxon signed-rank
Multiple groups (non-normal)?	Ordinal / non-normal	3+ independent	Kruskal-Wallis
Association between categories?	Categorical	2 variables	Chi-square test
Correlation?	Continuous	2 variables	Pearson or Spearman

Effect Size	Small	Medium	Large
Cohen's d (t-test)	0.2	0.5	0.8
eta-squared (ANOVA)	0.01	0.06	0.14
Cramer's V (chi-square)	0.1	0.3	0.5
Pearson r (correlation)	0.1	0.3	0.5

Hypothesis Testing Guide

Overview

The Hypothesis Testing Framework

Step-by-Step Procedure

Hypothesis Testing Guide

Overview

The Hypothesis Testing Framework

Step-by-Step Procedure

Common Errors

Test Selection Guide

Running Tests in Python

Independent Samples t-Test

One-Way ANOVA with Post-Hoc Tests

Chi-Square Test of Independence

Power Analysis and Sample Size

Effect Size Reference Table

Multiple Comparison Corrections

Best Practices

References

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling