Use when the user wants to assess how strong the evidence is behind scientific claims based on design, replication, sample size, venue, and recency.
Evaluates the strength of evidence behind scientific claims based on study design, sample size, replication status, venue quality, and recency. Produces structured evidence grades that help teams know how much weight to put on any given finding.
evidence_grading.grade_paper(
doi="10.1038/s41586-024-00001-0",
domain="machine_learning", # affects grade rubric
include_rationale=True
)
# Returns: {"grade": "B+", "level": "replicated_benchmark", "rationale": "..."}
evidence_grading.grade_claim(
claim="Scaling laws hold for code generation models",
supporting_papers=claim_tracker.get_papers("claim_0099"),
schema="custom" # strong | moderate | preliminary | anecdotal
)
evidence_grading.suggest_language(
claim="Our method outperforms baselines on protein fitness prediction",
grade="preliminary",
context="manuscript_results_section"
)
# Returns: "Our method demonstrates promising performance improvements over baselines..."
Returns grade, level label, confidence score, and rationale. For claim-level grading, returns aggregate grade with per-paper breakdown. Hedging language output is ready-to-use prose.
claim_auditor for calibrating claim strengthexperiment_skeptic when the paper's conclusion seems stronger than its evidencecontradiction-detection — contradicted claims automatically receive a grade penalty