Statistical test selection decision tree, per-test assumptions/formulas/interpretation guide, effect size, and power analysis. Use this skill for statistical analysis method selection involving 'statistical test', 't-test', 'ANOVA', 'chi-squared', 'correlation analysis', 'p-value', 'hypothesis testing', 'normality test', 'nonparametric test', 'effect size', etc. Enhances the analyst's statistical analysis capabilities. Note: data cleaning and visualization are outside this skill's scope.
A guide for selecting and interpreting appropriate statistical tests based on data type and analysis purpose.
What are you comparing?
├── Mean difference between two groups
│ ├── Independent samples → Normal? → Yes: Independent t-test
│ │ → No: Mann-Whitney U
│ └── Paired samples → Normal? → Yes: Paired t-test
│ → No: Wilcoxon signed-rank
├── Mean difference among three or more groups
│ ├── Independent → Normal? → Yes: One-way ANOVA → Post-hoc: Tukey HSD
│ │ → No: Kruskal-Wallis → Post-hoc: Dunn
│ └── Repeated measures → Repeated Measures ANOVA / Friedman
├── Relationship between two variables
│ ├── Continuous × Continuous → Linear? → Yes: Pearson correlation
│ │ → No: Spearman rank correlation
│ └── Categorical × Categorical → Chi-squared independence test
├── Proportion difference
│ ├── Two groups → Z-test (proportions)
│ └── Three or more groups → Chi-squared homogeneity test
└── Distribution testing
├── Normality → Shapiro-Wilk (n<5000) / K-S test
└── Homogeneity of variance → Levene's test / Bartlett's test
from scipy import stats
# Assumption checks
# 1. Normality
stat, p = stats.shapiro(group_a)
print(f"Normality test: p={p:.4f}")
# 2. Homogeneity of variance
stat, p = stats.levene(group_a, group_b)
print(f"Levene's test: p={p:.4f}")
# Conduct test
if levene_p >= 0.05:
t, p = stats.ttest_ind(group_a, group_b) # Equal variance