Evaluate a research paper's computational methods, tools, and reusability for bioinformatics work. Use this skill when the user wants to assess whether to adopt a method, benchmark a tool, or evaluate computational contributions. Trigger on: "is this method worth adopting", "evaluate this tool paper", "triage this as a methods paper", "should I use their pipeline", "how does this tool compare", "is their code reusable", "benchmark this", "what tools did they use", "is this package worth trying", or when the user explicitly asks about computational approaches rather than biology. This is the engineering counterpart to paper-biology.
Evaluate a paper's computational contribution: is the method novel, is the code reusable, should you adopt their approach?
For shared definitions (controlled vocabulary, evidence levels, PipelineConfidence,
connector behaviors), see paper-shared/SKILL.md.
Before evaluating, capture the raw facts:
Stick to what is documented. Mark anything inferred as "[inferred — not explicitly stated]".
Now score and interpret, building on Phase 1 facts.
# Paper Triage (Methods): [Short Title or First 10 Words]
**PMID:** [if known]
**DOI:** [if known]
**Evidence Level:** FullText | Abstract
**PipelineConfidence:** High | Medium | Medium-Ambiguous | Low
---
## MethodsScore: [0-100]
## ReuseScore: [1-5]
## MethodsSummary
[2-3 sentences: (1) what computational problem, (2) their approach,
(3) key performance/novelty claim.]
## MethodsRole
[1 sentence. Examples:
"Novel spatial deconvolution algorithm benchmarked against 5 alternatives",
"Pipeline wrapper orchestrating existing tools with minimal novelty",
"Comprehensive benchmark of CNV inference tools on single-cell data"]
## ProblemSolved
[1-2 sentences: What gap does this fill? What was prior best practice?]
## TechnicalApproach
**Algorithm type:** [graph-based, probabilistic, deep learning, heuristic, etc.]
**Language/framework:** [R/Bioconductor, Python/scanpy, Nextflow, etc.]
**Input requirements:** [what data types it accepts]
**Output format:** [what it produces]
**Scalability claim:** [dataset sizes tested]
## BenchmarkSummary
[If benchmarked against other tools:
- Tools compared: [list]
- Datasets: [real vs. simulated, sizes]
- Metrics: [accuracy, runtime, memory]
- Winner: [which tool, under what conditions]
- Benchmark limitations: [cherry-picked data? favorable metrics?]
"No formal benchmark" if absent.]
## DataTypes
[From controlled vocabulary in paper-shared.]
## Group
[PI LastName or Lab.]
---
## Reusability Assessment
**ReuseScore rationale:** [1-2 sentences explaining the score.]
**What you could directly adopt:**
- [Concrete element — specific function, parameter, workflow pattern]
**What requires adaptation:**
- [Element useful but needs modification for your context]
**Integration effort:**
["Drop-in replacement for X" | "Needs wrapper code" | "Conceptual only"]
**Data & code availability:**
- Code: [GitHub/Zenodo URL, or "Not reported"]
- License: [MIT, GPL, proprietary, not stated]
- Container: [Docker, conda, renv, or "None"]
- Test data: [included? downloadable?]
- Documentation: [vignettes, README quality, API docs]
**Maintenance signal:**
[Actively maintained ecosystem tool, or single-author side project?
Last commit date if visible. Community adoption indicators.]
---
## Hype vs. Substance (Methods)
**Performance claims well-supported:**
- [Backed by appropriate benchmark — multiple datasets, fair comparison]
**Performance claims oversold:**
- [Simulated data only, favorable metrics, straw-man comparisons]
**Reproducibility concerns:**
- [Missing seeds, unclear preprocessing, version pinning issues]
- ["None identified" if solid]
---
## Relevance to Current Pipeline
[2-3 sentences: Does this solve a problem you have? Replace or complement
something you use? Integrate with R/ArchR/SnapATAC2 or require language switch?
If not relevant, say so.]
| Score | Tier | Criteria |
|---|---|---|
| 0 | 0 | Not a method paper; pure biology with standard methods |
| 30-50 | 1 | Incremental tool improvement; web portal; standard pipeline |
| 55-69 | 1.5 | Useful utility; real problem solved but limited scope |
| 70-84 | 2 | New package for established task; framework extension; benchmark 3+ tools |
| 85-94 | 3 | Novel algorithm for unsolved problem; major ecosystem update; benchmark >5 tools |
| 95-100 | 4 | Fundamental breakthrough; redefines best practices |
| Score | Criteria |
|---|---|
| 1 | No data; methods lack detail; proprietary tools |
| 2 | Data may exist but no accessions; limited reproducibility |
| 3 | Standard availability (GEO); typical methods detail |
| 4 | Code repo + open protocols + multiple datasets |
| 5 | Full reproducibility package (data + code + container) |