Protocol for evaluating the statistical quality of a manuscript. Covers test selection, assumption verification, result reporting, common pitfalls, and p-hacking detection.

When to Use

Loaded by the Peer Reviewer agent for statistical accuracy verification
As part of internal peer review before journal submission
When reviewing a manuscript with quantitative results
When auditing statistical analyses for rigor and completeness

Protocol

1. Appropriateness of Statistical Tests

Review each statistical test used against the following criteria:

Question	What to Check
Does the test match the outcome variable type?	Continuous outcome with t-test/ANOVA, categorical with chi-square/Fisher's, time-to-event with log-rank/Cox
Does the test match the number of groups?	Two groups: t-test/Mann-Whitney. Three or more: ANOVA/Kruskal-Wallis

Protocol for evaluating the statistical quality of a manuscript. Covers test selection, assumption verification, result reporting, common pitfalls, and p-hacking detection.

When to Use

Loaded by the Peer Reviewer agent for statistical accuracy verification
As part of internal peer review before journal submission
When reviewing a manuscript with quantitative results
When auditing statistical analyses for rigor and completeness

Protocol

1. Appropriateness of Statistical Tests

Review each statistical test used against the following criteria:

Question	What to Check
Does the test match the outcome variable type?	Continuous outcome with t-test/ANOVA, categorical with chi-square/Fisher's, time-to-event with log-rank/Cox
Does the test match the number of groups?	Two groups: t-test/Mann-Whitney. Three or more: ANOVA/Kruskal-Wallis

Scenario	Expected Action
Multiple primary outcomes	Alpha adjustment (Bonferroni, Holm) or clearly designated single primary outcome
Multiple pairwise comparisons after ANOVA	Post-hoc test (Tukey, Dunnett, Games-Howell)
Subgroup analyses	Interaction test before subgroup comparisons; label as exploratory
Multiple secondary outcomes	FDR correction (Benjamini-Hochberg) or state as exploratory
Correlation matrices	FDR correction for number of comparisons

Analysis Type	Expected Effect Size	Interpretation Aid
Two-group comparison (continuous)	Cohen's d or mean difference with CI	Small: 0.2, Medium: 0.5, Large: 0.8
ANOVA	Eta-squared or partial eta-squared	Small: 0.01, Medium: 0.06, Large: 0.14
Correlation	r or rho (already an effect size)	Small: 0.1, Medium: 0.3, Large: 0.5
Chi-square	Cramer's V or phi	Depends on df
Logistic regression	Odds ratio with CI	Meaningful thresholds context-dependent
Cox regression	Hazard ratio with CI	Meaningful thresholds context-dependent
Linear regression	R-squared, standardized beta	Report adjusted R-squared for model

Statistical Review

When to Use

Protocol

1. Appropriateness of Statistical Tests

Statistical Review

When to Use

Protocol

1. Appropriateness of Statistical Tests

2. Assumption Verification

Parametric Tests (t-test, ANOVA, linear regression)

Regression Models

Non-parametric Tests

3. Multiple Testing Correction

4. Effect Size Reporting

5. Confidence Intervals

6. Sample Size Justification

7. Missing Data Handling

8. P-Hacking and Selective Reporting Red Flags

Analysis Red Flags

Reporting Red Flags

What to Do

9. Additional Statistical Quality Checks

Output Format

Checklist

Test Selection and Assumptions

Reporting Quality

Data Integrity

Bias Assessment

References

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags