Evaluates the quality of experimental evidence: whether experiments distinguish competing hypotheses, whether baselines are properly tuned and compared, statistical rigour, reproducibility, and pre- vs post-hoc analysis. Produces structured findings for Section 3 of the critique report.
Evaluate the rigour and quality of the experimental evidence presented in the thesis. Strong evidence distinguishes between hypotheses — results should vary significantly depending on which hypothesis is true.
First, apply ./evidence_review_checklist.md from this skill folder. Use it rather than evaluating experiments from memory — the checklist ensures every experiment is assessed against the same criteria (design, baselines, statistics, reproducibility, pre/post-hoc). It also forces explicit identification of all experiments upfront, preventing gaps:
[3.N] PRIORITY: HIGH | MEDIUM | LOW
Location: <file path> — section "<heading>" — near: "<≤20-word quote>"
Guideline: <one-sentence statement of the evidence rule violated>
Problem: <2–5 sentences: what is weak or missing and why it matters>
Fix:
1. <Concrete instruction>
2. <Additional step if needed>
Number issues 3.1, 3.2, 3.3, etc.