Name: Statistical Analysis
Author: winstonkoh87

Statistical Analysis Skill

Purpose: Structured pipeline for statistical analysis deliverables. Prevents assumption violations, missed effect sizes, and uninterpretable output. Origin: Created ahead of Assignment 19 (SPSS, Siva, $250, deadline Mar 8). No protocol coverage existed for this domain.

The 5-Step Pipeline

Step 1: DATA AUDIT

Load dataset (CSV, SPSS .sav, Excel)
Profile: N, variable types (nominal/ordinal/interval/ratio), missing data %, outliers
Check for:
- Missing data pattern (MCAR/MAR/MNAR) — Little's MCAR test if available
- Outliers (z-score > 3 or IQR method)
- Variable coding (reverse-coded items, string-to-numeric conversion)
- Sample size adequacy per planned test (rule of thumb: 10–15 observations per predictor for regression)

Step 2: ASSUMPTION MATRIX

Statistical Analysis Skill

Purpose: Structured pipeline for statistical analysis deliverables. Prevents assumption violations, missed effect sizes, and uninterpretable output. Origin: Created ahead of Assignment 19 (SPSS, Siva, $250, deadline Mar 8). No protocol coverage existed for this domain.

The 5-Step Pipeline

Step 1: DATA AUDIT

Load dataset (CSV, SPSS .sav, Excel)
Profile: N, variable types (nominal/ordinal/interval/ratio), missing data %, outliers
Check for:
- Missing data pattern (MCAR/MAR/MNAR) — Little's MCAR test if available
- Outliers (z-score > 3 or IQR method)
- Variable coding (reverse-coded items, string-to-numeric conversion)
- Sample size adequacy per planned test (rule of thumb: 10–15 observations per predictor for regression)

Test Family	Assumptions	Check Method
Reliability (Cronbach's α)	Unidimensionality, interval/ratio data, ≥3 items per scale	Factor analysis / item-total correlations
Chi-Square (χ²)	Independence, expected frequency ≥ 5 in 80%+ cells, categorical variables	Expected frequency table
Pearson Correlation	Linearity, normality (both vars), no significant outliers, interval/ratio	Scatter plot, Shapiro-Wilk
Spearman Correlation	Monotonic relationship, ordinal or non-normal interval	Scatter plot (monotonic check)
Multiple Regression	Linearity, independence (Durbin-Watson), homoscedasticity, normality of residuals, no multicollinearity (VIF < 10)	Residual plots, VIF table, Durbin-Watson
Independent t-test	Normality, homogeneity of variance (Levene's), interval/ratio DV	Shapiro-Wilk, Levene's
One-way ANOVA	Normality, homogeneity (Levene's), independence, interval/ratio DV	Same as t-test + post-hoc if significant

Effect Size	Small	Medium	Large
Cohen's d	0.2	0.5	0.8
r	0.1	0.3	0.5
R²	0.01	0.09	0.25
η²	0.01	0.06	0.14
Cramér's V (df=1)	0.1	0.3	0.5
Cronbach's α	< 0.6 poor	0.7–0.8 acceptable	> 0.9 excellent

Component	Count	Details
Reliability (Cronbach's α)	5	One per scale/construct
Chi-Square (χ²)	4	Independence tests (demographic × outcome)
Correlation	4	Bivariate (IV-DV pairs)
Regression	1	Multiple regression (4 IVs → 1 DV)
Total tests	14
Topic	Safety Training in SG Construction
N	185 survey responses
IVs	4 (to be identified from data)
DV	1 (to be identified from data)

Statistical Analysis

Statistical Analysis Skill

The 5-Step Pipeline

Step 1: DATA AUDIT

Step 2: ASSUMPTION MATRIX

Statistical Analysis

Statistical Analysis Skill

The 5-Step Pipeline

Step 1: DATA AUDIT

Step 2: ASSUMPTION MATRIX

Step 3: TEST EXECUTION

Step 4: INTERPRETATION

Step 5: CLIENT-READY REPORT

Assignment 19 Quick Reference

Exit Gate

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling