Judge how much trust a study or small set of studies deserves before drawing conclusions. Use when relevant papers have been found and the next step is to assess study design, evidence strength, limitations, and applicability rather than merely listing sources.
Assess how much evidentiary weight a study or source should carry.
The job of this skill is not to act like peer review. The job is to separate stronger from weaker evidence, make limitations explicit, and prevent low-quality studies from being treated as settled support.
This skill appraises study quality. It does not replace literature search, and it does not by itself produce a full cross-study synthesis.
Run this skill when:
scientific-method-selector routes to appraisalThis is a strong checkpoint before:
In scope:
Out of scope by default:
Pause and keep the appraisal narrow when:
If those conditions persist, recommend literature-search, evidence-synthesizer, or a narrower claim before stronger conclusions.
Prefer higher confidence when:
Reduce confidence when:
When quality matters, assess these certainty dimensions explicitly:
risk_of_biasinconsistencyindirectnessimprecisionpublication_bias_or_reporting_limitsUse them as a lightweight certainty check, not as a formal full-review claim.
If the source is a preprint or otherwise non-peer-reviewed:
Always return:
study_or_sourcetarget_claimstudy_typequality_signalsmajor_limitationsapplicabilityevidence_strength (strong, moderate, weak, very-weak)certainty_dimensionspeer_review_statusconfidence_notesrecommended_followupnext_stepUser request:
We found a small study suggesting cold exposure improves mood. How much should we trust it?
Expected shape of response:
study_or_source: small mood-related cold exposure studytarget_claim: cold exposure improves mood or mental healthstudy_type: likely small human intervention or observational studyquality_signals: direct intervention but limited sample and likely short-term outcomesmajor_limitations: low power, heterogeneous protocol, short follow-up, possible proxy outcome issuesapplicability: partial for short-term mood effects, weak for broad mental health claimsevidence_strength: weakconfidence_notes: interesting signal but far from settled support for broad claimsrecommended_followup: evidence-synthesizer or broader literature searchnext_step: compare it against higher-quality or larger studies before making a claim-strength judgment