Name: Reproduce Paper
Author: LukasWallrich

Reproduce Paper | Skills Pool

Read the paper's data section to identify the exact dataset(s): name, version, study number, source archive (e.g., GESIS, ICPSR, UK Data Service)
Always use the exact dataset version cited by the authors. If the paper cites "ALLBUS Cumulation 1980-2008 (ZA4572)", use ZA4572 — do NOT substitute a newer cumulation (e.g., ZA4576) even if it covers the same period, because:
- Variable codes, value labels, and data cleaning may differ between versions
- Cumulative files may receive corrections or recoding between releases
- Using a different version introduces an unnecessary confound when interpreting deviations
If the exact version is unavailable and a different version must be used, this MUST be:
- Explicitly flagged to the user before proceeding, explaining what version is available vs. what was cited
- Prominently documented in the Analytical Decisions and Assumptions section of the report
- Noted as a possible explanation for any deviations observed
If the paper uses multiple data files (e.g., a cumulation covering most years + an individual wave for a later year), obtain ALL of them in the exact versions cited.
Check whether the data files are already present in the working directory's data/ subfolder. If the available files are a different version than what the paper cites, tell the user and ask whether to proceed with the available version or wait for the correct one.
If data is missing:
- Check if the data can be downloaded programmatically (many archives require registration)
- If it cannot be downloaded directly, ask the user to provide the data. Be specific:
  - Exact dataset name and study number (e.g., "ALLBUS Cumulation 1980-2008, GESIS ZA4572")
  - File format needed (.sav, .dta, .csv)
  - Where to place the files (e.g., data/filename.sav)
  - Download URL if available (e.g., https://search.gesis.org/research_data/ZA4572)
- Stop and wait for the user to confirm the data is available before proceeding
If the paper uses multiple data files (e.g., a cumulation + individual wave, or linked datasets), identify ALL of them upfront
Document the data sources in claim_result_mapping.md, including exact version numbers and any discrepancies between cited and used versions

Descriptive statistics comparison: If the paper reports descriptive statistics (means, SDs, frequencies) for the analysis sample, reproduce these and include a comparison table before the main analysis results. Discuss any notable discrepancies — they indicate sample composition differences that may explain downstream coefficient deviations.
For every analytical decision where the paper is ambiguous or silent, document your choice and rationale in the 'Analytical Decisions and Assumptions' section of the report. Do not silently make assumptions — every inferred choice must be recorded.
For each claim-result pair:
- Write R code that produces the corresponding statistic
- Store both the paper's target value and the reproduced value
- Compute the deviation
Build comparison tables programmatically using tibble() and kable()
Build the deviation log programmatically — NEVER hardcode deviation tibbles with manually entered reproduced values. The deviation log must compute values from model objects or stored results. Structure the deviation log by claim (Claim column as first column).
Deviation log completeness: The deviation log must include entries for all claim-relevant parameters from all reproduced models/tables/figures — i.e., the parameters that directly bear on the abstract claims (main predictors, interactions, moderators, sample sizes, fit statistics). Control variable coefficients do not need entries unless a claim depends on them. If a table has multiple models (e.g., Table 2 Model 1 and Model 2), ALL models must be in the log. If a parameter supports multiple claims, either duplicate the entry or use a compound claim label (e.g., "1, 3"). Do not reference results in the conclusion that are not in the deviation log.
Significance in the deviation log: The deviation log should report statistical significance when available. For each fixed-effect parameter (not variance components or fit statistics):
- Paper significance: Extract from the paper as reported. Look in three places, in order of priority:
  1. Tables: significance stars, explicit p-values, confidence intervals excluding zero/null, or stated significance thresholds. Record as-is (e.g., "p < .05", "***", "n.s.").
  2. Prose: If tables do not report significance (e.g., only coef and SE), check the paper's text for significance claims about specific parameters (e.g., "the time trend is significant", "the interaction was not statistically different from zero"). These textual claims are the paper's assertion and should be recorded as the paper's significance for that parameter.
  3. Neither: If neither tables nor prose make a significance claim for a parameter, leave the paper significance blank.
  Do NOT compute significance from the paper's rounded coefficient and SE using a normal/Wald approximation — papers may use different tests (likelihood ratio, Satterthwaite, F-test, permutation, etc.) and rounding of SEs makes such approximations unreliable. The paper's significance claims are what we reproduce, not what we can reverse-engineer from rounded numbers.
- Reproduced significance: Always compute from the reproduced model using the appropriate test for that model type. For lme4/lmerTest models, use lmerTest::lmer() (load library(lmerTest) instead of library(lme4)) to get Satterthwaite p-values. For glm, use the model summary p-values. For other packages, use whatever significance test the package provides.
- Display reproduced significance stars alongside reproduced values in the deviation log table.
- Conclusion changes based on significance: Flag a significance-based conclusion change if the paper claims a parameter is significant (or non-significant) — whether in a table or in the prose — and the reproduction contradicts this. If neither tables nor prose make a significance claim for a parameter, do not flag significance-based conclusion changes for it — only flag direction reversals.
- Include a footnote in the deviation log table explaining how reproduced significance was computed (e.g., "Satterthwaite p-values via lmerTest") and where paper significance was sourced from (table stars, prose claims, or not reported).

Category	Criterion	Action
Exact	Reproduced value rounds to the same reported value	None needed
Minor deviation	≤5% relative deviation (or ≤0.01 absolute when near-zero rule applies*)	Note in deviation log
Substantive deviation	>5% relative deviation (and >0.01 absolute when near-zero rule applies*)	Investigation required
Conclusion change	Statistical significance or effect direction differs	Full investigation + alternative reproduction

Review the rendered output from Stage 4 to assess all deviations and identify possible explanations. Use the rendered .md file (produced by keep-md: true in the YAML header) rather than the .html, as it is much smaller and easier to read while containing the same computed values.
Write the report following the exact section structure from the template:
- Abstract (callout) — verdict + summary
- Open Materials — availability table + note on whether/when replication materials were consulted
- Paper Overview — research question, key methods, claims from the abstract (numbered list)
- Data and Variable Construction
- Analytical Decisions and Assumptions — all inferred decisions documented
- Reproduction Results — subsection per analysis
- Deviation Log — programmatic, structured by claim, color-coded with background colors (green #d4edda = Exact, yellow #fff3cd = Minor, red #f8d7da = Substantive, dark red #f5c6cb = Conclusion change)
- Discrepancy Investigation — only if deviations exceed rounding error
- Conclusion — summary paragraph + claim-by-claim numbered assessment
- Session Info
The Abstract callout MUST state the verdict directly (e.g., **Verdict**: QUALITATIVELY REPRODUCED). It MUST NOT defer to the Conclusion section. This is a hard structural requirement.
The Conclusion must contain:
- A summary paragraph with the overall assessment
- A claim-by-claim numbered assessment stating whether each claim was reproduced, with brief explanation
- The conclusion MUST NOT claim "all claims confirmed" or "all qualitative conclusions confirmed" if any claim has zero deviation log entries or any entry marked "Not tested". Add a programmatic check: for each claim number listed in the Paper Overview, count deviation log rows. If any claim has zero rows, the verdict paragraph must explicitly note which claims were not numerically tested and exclude them from the "all confirmed" language.
Set cache: false in the YAML header and do a clean final render
Verify the final HTML output is complete and self-contained
Final review checklist — read the entire rendered report end-to-end and check for:
- Section structure matches template exactly (no extra or missing sections, no "Executive Summary")
- Abstract states the verdict directly (not "See Conclusion section below")
- Inconsistencies between sections (e.g., abstract verdict that contradicts the deviation log or conclusion)
- Overclaimed causality (distinguish between established explanations and hypotheses)
- Statistical errors in prose (e.g., claiming that reference category choices affect other predictors' coefficients — they don't; or confusing significance with effect size)
- Placeholder text that was never filled in
- Numbers in narrative text that don't match the computed tables
- Deviations described without noting their substantive direction: always state whether deviations strengthen or weaken the paper's claims (e.g., "more negative correlations, i.e., stronger effects that reinforce the paper's conclusion" rather than just "more negative correlations")
- Deviation log is computed programmatically (not hardcoded) and structured by claim
- Deviation log uses background-color styling for categories
- Claim coverage: every claim in Paper Overview has ≥1 deviation log entry (or explicit "Not tested" entry)
- Claims consistency: the numbered claims in Paper Overview match exactly those in claim_result_mapping.md — no claims added, dropped, or renumbered between documents
- Phantom references: search the report text for references to figures or tables (e.g., "Figure 13", "Table 3") and verify that each referenced figure/table actually exists in the Reproduction Results section. Remove or qualify any reference to an analysis that was not performed.
- All inferred decisions and assumptions are documented in the Analytical Decisions section
- Deviation explanations use appropriately hedged language — no unsubstantiated causal claims
- Abstract hedging: The Abstract callout must use the same hedging standards as the Discrepancy Investigation section. Avoid "most likely due to" for unverified explanations — use "possibly due to" or "which may be related to."
- Abstract-deviation log consistency: For each claim described as "fully reproduced" or "reproduced" in the Abstract, verify that the deviation log contains zero conclusion changes for that claim. If any conclusion changes exist, the Abstract must acknowledge them — use "qualitatively reproduced" with caveats, not "fully reproduced."
- Conclusion does not overclaim: if any claim was not tested, the conclusion must not say "all claims confirmed"

Category	Criterion	Action
Exact	Rounds to same reported value	None needed
Minor deviation	≤5% relative deviation (or ≤0.01 absolute when	paper value
Substantive deviation	>5% relative deviation (and >0.01 absolute)	Investigation required
Conclusion change	Significance/direction differs	Full investigation + alternative reproduction

Reproduce Paper

Skill: Reproduce Paper

Overview

Inputs

Workflow

Reproduce Paper

Skill: Reproduce Paper

Overview

Inputs

Workflow

Stage 0: Data Acquisition

Stage 1: Extract Claims

Stage 2: Identify Dataset & Methods

Stage 3: Write Reproduction Code

Stage 4: Test & Render

Stage 5: Assess Deviations

Stage 6: Investigate Discrepancies (if any)

Stage 7: Report

Stage 8: Automated Review

Deviation Assessment Categories

Overall Verdict Criteria

Notes

Goplaces

Research Ops

Editor

Fact Checker

Deep Research

Academic Researcher