Systematic review and meta-analysis pipeline for medical research. Covers protocol registration (PROSPERO), search strategy, screening, data extraction, risk of bias assessment (QUADAS-2/ROBINS-I), statistical synthesis (bivariate/HSROC for DTA, random-effects for intervention), and PRISMA-compliant reporting. Supports both DTA and intervention meta-analyses.
You are helping a medical researcher conduct a systematic review and meta-analysis. You support the full pipeline from protocol development to submission-ready manuscript, with specialized support for diagnostic test accuracy (DTA) meta-analyses.
${CLAUDE_SKILL_DIR}/references/)${CLAUDE_SKILL_DIR}/references/PROSPERO_template.md -- field-by-field guide with word limits, pitfalls checklist${CLAUDE_SKILL_DIR}/references/icmje_coi_guide.md -- batch generation, python-docx pitfalls, form structure${CLAUDE_SKILL_DIR}/references/r_templates.md${CLAUDE_SKILL_DIR}/references/checklists/
PRISMA_DTA.md -- 27-item checklistQUADAS2.md -- 4 domains + signalling questionsROBINS_I.md -- 7 domains + pre-assessment + synthesis recommendationRoB2.md -- 5 domains + signalling questions + overall judgmentPROBAST.md -- 4 domains + AI extension + validation studiesNOS.md -- Cohort (8 items) + Case-control (8 items) + star interpretationJBI_Case_Series.md -- 10-item critical appraisal checklist for case series| Type | RoB Tool | Statistical Model | Reporting Guideline |
|---|---|---|---|
| DTA (diagnostic test accuracy) | QUADAS-2 | Bivariate / HSROC | PRISMA-DTA |
| Intervention (treatment effect) | RoB 2 (RCT) / ROBINS-I (NRSI) | Random-effects (DL/REML) | PRISMA 2020 |
| Prognostic (prediction model) | QUIPS / PROBAST | Random-effects | PRISMA 2020 |
| Observational (prevalence/association) | NOS / JBI | Random-effects | MOOSE |
Auto-detect type from the research question or accept user specification.
Goal: Produce a PROSPERO-ready protocol document.
Structure the research question:
Define eligibility criteria:
Plan the search:
Plan RoB assessment:
Plan synthesis:
Generate PROSPERO registration document:
${CLAUDE_SKILL_DIR}/references/PROSPERO_template.md for field-by-field guidance7_Submission/ or equivalent directoryGoal: Develop and validate reproducible search strategies.
Build search blocks from PIRD/PICO:
Combine with Boolean operators:
Execute search per database using /search-lit:
Report search per PRISMA-S (Rethlefsen et al. 2021, PMID:33499930): Save search strategies as a structured document, one section per database, with date of search, number of results, and any limits applied.
Merge and deduplicate: Combine all database results into a single spreadsheet. Deduplicate by DOI first, then PMID. Save raw counts for PRISMA flow.
Goal: Systematic title/abstract and full-text screening with two independent reviewers.
round1_{date}.tsv with color-coded decisions.round2_tag as INCLUDE / EXCLUDE / MAYBE based on R1+R2 agreement (MAYBE = disagreement OR either reviewer flagged uncertain).round2_{date}.tsv (adds round2_tag, round2_reason columns).round3_decision (INCLUDE/EXCLUDE) and round3_reason (only when overturning R2).references/ai_pre_screening_template.py (customize per project).ai_suggestion (INCLUDE/EXCLUDE/UNCERTAIN/CONFIRM-INCLUDE) + ai_reason columns.round3_{date}.tsv with finalized round3_decision.round3_decision = INCLUDE, retrieve full-text PDFs (use /fulltext-retrieval).Track numbers at each stage for PRISMA flow diagram (R1 → R2 → R3 → R4 → final included).
Use /make-figures to generate PRISMA flow diagram when numbers are finalized.
Goal: Create standardized extraction forms and extract 2x2 or effect size data.
Generate a data extraction form with:
Generate a data extraction form with:
Output: Excel/CSV template for data entry.
When studies report outcomes only as Kaplan-Meier curves without raw event counts:
Digitise the KM curve: Use WebPlotDigitizer (https://automeris.io/WebPlotDigitizer/)
time, cumulative_event_rate (or survival)Extract number-at-risk: Record from the table below the KM plot at each time point.
Reconstruct IPD: Use the R IPDfromKM package (Guyot et al. 2012 method):
library(IPDfromKM)
dat <- read.csv("digitised_curve.csv")
preproc <- preprocess(dat, trisk, nrisk, totalpts, maxy = 1)
ipd <- getIPD(preproc, armID = 1) # armID starts at 1, NOT 0
preprocess() does NOT accept a mateflag parameter (common error)armID starts at 1 (not 0)Verify: Generate a reconstructed KM plot and visually compare to the original figure.
Report in Methods: Cite Guyot et al. 2012 (doi:10.1186/1471-2288-12-9) and state which studies required reconstruction.
Alternative — Text-based extraction: When no subgroup-specific KM curve exists but the text reports "0% LTP at 12 months" or similar, extract directly from text. Document the page number and exact quote.
When a study's intervention is a composite of multiple techniques:
Always pre-specify a sensitivity analysis excluding composite-exposure studies. Document the extraction strategy in the data extraction form Notes column.
When comparing extraction results between independent reviewers (minimum 2), check:
Inter-reviewer agreement: Calculate and report screening agreement: % agreement or Cohen's kappa at title/abstract and full-text stages. If kappa was not calculated, report the exact number of discrepant records and the resolution method.
Denominator consistency: Verify sample sizes match between reviewers.
Watch for per-patient vs per-lesion/per-tumor unit confusion.
CRITICAL: The denominator may differ across outcomes within the same study
(e.g., LTP assessed only among treatment-naive nodules, but complications assessed
among all treated tumors). For each outcome, back-calculate: event ÷ denominator
must equal the percentage reported in the paper's Tables. If it does not match,
investigate the analysis population definition in the Methods section.
If denominators differ, return to the original paper's Tables/Flow diagram.
Arithmetic verification: Back-calculate proportions from event/total counts and cross-check against original text (e.g., 78/91 = 85.7%).
Kaplan-Meier estimate distinction: KM curve estimates differ from raw event counts. Always record the data source (Table vs KM curve vs text) during extraction.
Discrepancy resolution: List all discrepancies → verify against original text → reach consensus → if consensus fails, use third reviewer. Log all consensus decisions in {project}/consensus_log.md.
Dataset lock: After resolving all discrepancies, lock the final dataset. Any subsequent changes require documented justification with date.
Goal: Guide structured RoB assessment with the appropriate tool.
Select tool based on meta-analysis type (see table above), then read the corresponding checklist:
| Tool | Checklist File |
|---|---|
| QUADAS-2 (DTA) | ${CLAUDE_SKILL_DIR}/references/checklists/QUADAS2.md |
| RoB 2 (RCT) | ${CLAUDE_SKILL_DIR}/references/checklists/RoB2.md |
| ROBINS-I (NRSI) | ${CLAUDE_SKILL_DIR}/references/checklists/ROBINS_I.md |
| PROBAST (Prediction) | ${CLAUDE_SKILL_DIR}/references/checklists/PROBAST.md |
| NOS (Observational) | ${CLAUDE_SKILL_DIR}/references/checklists/NOS.md |
| JBI (Case Series) | ${CLAUDE_SKILL_DIR}/references/checklists/JBI_Case_Series.md |
For AI/ML prediction models, also apply PROBAST+AI extensions.
Output: Summary table + traffic light plot (use /make-figures).
Goal: Execute meta-analysis and generate publication-ready outputs.
IMPORTANT: Always use R for meta-analysis (packages: meta, metafor, mada).
See ${CLAUDE_SKILL_DIR}/references/r_templates.md for code templates.
library(mada) # bivariate model, forest/SROC plots
library(meta) # general meta-analysis utilities
library(metafor) # advanced models
# Bivariate model (recommended for DTA)
fit <- reitsma(data, formula = cbind(tsens, tfpr) ~ 1)
summary(fit)
# SROC curve with confidence and prediction regions
plot(fit, sroclwd = 2, main = "SROC Curve")
# Forest plot (paired: sensitivity + specificity)
forest(fit, type = "sens")
forest(fit, type = "spec")
Key outputs for DTA:
library(meta)
library(metafor)
res <- metagen(TE, seTE, data = dat, studlab = study,
method.tau = "REML", sm = "OR")
forest(res)
funnel(res)
summary(res) # I-squared, tau-squared, Q test
metabias(res, method.bias = "Egger")
metainf(res, pooled = "random") # leave-one-out
When both comparative and single-arm studies are available, use dual analysis (precedent: Lin 2025 PMID:41419890, Su 2026 PMID:41653198). The assignment of PRIMARY vs SECONDARY depends on the research question and available evidence:
| Scenario | Primary | Secondary | Rationale |
|---|---|---|---|
| Enough comparative studies (k≥8) | Comparative OR/RR | Pooled proportion | Direct comparison answers efficacy |
| Limited comparative (k<6), many single-arm | Pooled proportion | Comparative OR/RR | Insufficient power for comparative; pooled proportion provides descriptive evidence |
| Mixed (moderate k, each) | Discuss with co-authors | — | PI/methodologist decision |
The choice should be pre-specified in the PROSPERO protocol and remain consistent throughout the manuscript.
# Comparative MA (binary outcomes)
res_comp <- metabin(ei, ni, ec, nc, data = dat,
studlab = study, sm = "OR",
method = "Inverse", method.tau = "DL",
common = FALSE, random = TRUE,
method.random.ci = "HK", incr = 0.5)
# Single-arm pooled proportion
res_prop <- metaprop(event, n, data = dat_single,
studlab = study, sm = "PLOGIT",
method.tau = "DL", method.ci = "CP")
Key points:
metaprop() with logit transformation + Clopper-Pearson CImethod = "Inverse" not "MH" to avoid method.tau conflictmethod.tau = "DL" (DerSimonian-Laird) -- REML may not converge with sparse datamethod.random.ci = "HK" (Hartung-Knapp) instead of deprecated hakn = TRUEcommon = FALSE, random = TRUE instead of deprecated comb.fixed/comb.randomincr = 0.5 continuity correctionmetainf())Goal: Assess certainty of the body of evidence.
For DTA meta-analysis, apply GRADE-DTA framework:
For intervention meta-analysis, apply standard GRADE.
Output: Summary of Findings table.
Goal: Generate PRISMA-compliant manuscript sections.
/check-reporting with PRISMA-DTA or PRISMA 2020/write-paper with meta-analysis type selected/make-figures for:
| Pitfall | Problem | Solution |
|---|---|---|
| Separate pooling of Se/Sp | Ignores correlation | Use bivariate/HSROC model |
| Ignoring threshold effect | False heterogeneity | Check Spearman correlation, SROC plot |
| Standard funnel plot for DTA | Inappropriate | Use Deeks' funnel plot |
| I-squared only for heterogeneity | Doesn't capture threshold effect | Use prediction region on SROC |
| Missing GRADE | Common omission in DTA MA | Apply GRADE-DTA. If <4 studies, assess each domain narratively and state the limitation explicitly |
| Partial verification bias | Inflates sensitivity | Assess in QUADAS-2 Flow & Timing domain |
| Unevaluable results excluded | Biases accuracy estimates | Report intent-to-diagnose analysis |
When the number of included studies is small (< 10):
| When | Call | Purpose |
|---|---|---|
| Need literature search | /search-lit | PubMed/Semantic Scholar search with verified citations |
| Need statistical code | /analyze-stats | Execute R/Python analysis scripts |
| Need figures | /make-figures | PRISMA flow, forest plots, SROC, funnel plots |
| Need reporting check | /check-reporting | PRISMA-DTA / PRISMA 2020 compliance |
| Need manuscript writing | /write-paper | Full IMRAD manuscript generation |
| Need self-review | /self-review | Pre-submission quality check |
[VERIFY: variable_name] and ask the user to confirm against the data dictionary./search-lit for all citations.