Name: Evidence Chain Assessment
Author: EvidenceOS

스킬 검색.../

Phase	Name	Key Question	Gold Standard
1	Technical Validation	Does the model perform well on held-out data?	AUROC, AUPRC on external test set
2	Clinical Validation	Does performance hold in clinical conditions?	Prospective study in clinical setting
3	Clinical Utility	Does using the model improve clinical decisions?	Randomized controlled trial or DCA
4	Implementation	Can the model be deployed safely in real workflows?	Deployment study with workflow integration
5	Monitoring	Does performance hold over time in production?	Continuous monitoring with drift detection

Phase 1: Technical Validation
├── Published papers: [list with DOIs]
├── Test set description: [internal? external? multi-site?]
├── Key metrics: [AUROC, sensitivity, specificity, calibration]
├── Comparison to baseline: [what was the comparator?]
└── Assessment: [Strong / Adequate / Weak / Missing]

Phase 2: Clinical Validation
├── Prospective studies: [list with DOIs]
├── Population: [who was studied? representative?]
├── Setting: [academic? community? LMIC?]
├── Sample size: [adequate for the prevalence?]
└── Assessment: [Strong / Adequate / Weak / Missing]

Phase 3: Clinical Utility
├── RCTs or comparative studies: [list with DOIs]
├── Outcome measures: [patient outcomes? process measures?]
├── Decision Curve Analysis: [done? results?]
├── Override/adoption rates: [do clinicians actually use it?]
└── Assessment: [Strong / Adequate / Weak / Missing]

Phase 4: Implementation
├── Deployment studies: [list with DOIs]
├── Workflow integration: [how was it embedded?]
├── User experience: [clinician feedback?]
├── Failure modes: [what went wrong?]
└── Assessment: [Strong / Adequate / Weak / Missing]

Phase 5: Monitoring
├── Post-deployment monitoring: [is it being tracked?]
├── Performance drift: [has performance changed over time?]
├── Demographic fairness: [different performance by subgroup?]
├── Feedback loop: [are corrections fed back to model?]
└── Assessment: [Strong / Adequate / Weak / Missing]

Score	Meaning
Phase 1 only	Lab-ready. Not clinically validated.
Phase 1-2	Clinically validated. Not proven useful.
Phase 1-3	Clinical utility demonstrated. Ready for deployment planning.
Phase 1-4	Deployed. Needs monitoring plan.
Phase 1-5	Full evidence chain. Gold standard.

Criterion	Meets Standard	Below Standard
All 5 phases assessed	Every phase documented with evidence or "Missing"	Phases skipped
Evidence accurately represented	Cited papers match claims	Misrepresentation of study findings
Gap identification	Critical gap identified with clinical reasoning	Superficial gap analysis
Study proposal	Feasible study design addressing the right gap	Unrealistic or misdirected proposal

Evidence Chain Assessment | Skills Pool

Evidence Chain Assessment

Evidence Chain Assessment

Purpose

Learning Objectives

Context

Steps

Step 1: Learn the 5 Phases

Step 2: Select an AI Tool to Assess

Step 3: Evidence Mapping

Step 4: Evidence Maturity Score

Step 5: Gap Analysis and Next Study Design

Artifacts

Assessment Criteria

Common Mistakes

References

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns

Evidence Chain Assessment

Evidence Chain Assessment

Purpose

Learning Objectives

Context

Steps

Step 1: Learn the 5 Phases

Step 2: Select an AI Tool to Assess

Step 3: Evidence Mapping

Step 4: Evidence Maturity Score

Step 5: Gap Analysis and Next Study Design

Artifacts

Assessment Criteria

Common Mistakes

Related Skills

References

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns