Use when the user asks for a literature review, academic deep dive, research report, state-of-the-art survey, topic scoping, comparative analysis of methods/papers, grant background, or any request that needs multi-source scholarly evidence with citations. Also trigger proactively when a user question clearly requires academic grounding (e.g. "what's known about X", "compare approach A vs B in the literature", "summarize the field of Y"). Runs a 7-phase, script-driven research workflow across OpenAlex, arXiv, Crossref, and PubMed, with deduplication, transparent ranking, citation chasing, self-critique, and structured report output with verifiable citations.
End-to-end academic research workflow that turns a question into a cited, structured report. Built for depth: multi-source federation, transparent ranking, citation chasing, and a mandatory self-critique pass before the report ships.
Explicit triggers: "literature review", "research report", "state of the art", "survey the field", "what's known about X", "deep research on Y", "systematic review", "scoping review", "compare papers on Z".
Proactive triggers (use without being asked):
Do not use when: a single known paper answers the question, the user wants a tutorial (not a survey), or they're debugging code.
scripts/research_state.json. Queries ran, papers seen, decisions made, phase progress. Research becomes resumable and auditable.[^id] where id matches a paper in state. Unanchored claims are treated as hallucinations and fail the gate.Phase 0: Scope → decompose question, pick archetype, init state
Phase 1: Discovery → multi-source search, dedupe
Phase 2: Triage → rank, select top-N for deep read
Phase 3: Deep read → extract evidence per paper
Phase 4: Chasing → citation graph (forward + backward)
Phase 5: Synthesis → cluster by theme, map tensions
Phase 6: Self-critique → adversarial review, gap finding
Phase 7: Report → render archetype template, export bibliography
Each phase writes to research_state.json before advancing. If the user pauses or a session crashes, the next run reads the state and picks up from the last completed phase.
Step 0 — Check for skill updates. Before anything else, run:
python scripts/check_update.py
The script never fails the workflow — it always exits 0 and returns an envelope whose data.action field tells you what happened. Route on action as follows:
up_to_date → continue silently. Do not mention it to the user.updated → tell the user in one line: [Skill updated: <from> → <to> (<commits_behind> commits). Continuing with new version.]. If the envelope also has requirements_changed: true, append: Python deps changed — run `pip install -r requirements.txt` before next use.update_available → only appears with --dry-run; surface it the same way as updated but with "available, not applied."skipped_dirty → tell the user in one line: [Skill update skipped — you have local changes in <dirty_count> file(s). Review with `cd <skill_root> && git status` .] so they know their work is safe but they are running a stale version.skipped_disabled → continue silently. The user set SCHOLAR_SKIP_UPDATE_CHECK=1 on purpose to pin a version.not_a_git_repo → continue silently. The skill was installed via ClawHub / SkillsMP / a tarball; its package manager owns updates.check_failed → continue silently. Research takes priority over update checks; the user can always re-run check_update.py later.Then proceed with the remaining Phase 0 steps below. Never block the workflow on a failed update check.
Before searching anything, decompose the question.
references/report_templates.md):
literature_review — what is known about X (default)systematic_review — rigorous PRISMA-lite, comparison of many studies on one narrow questionscoping_review — what has been studied and how (breadth over depth)comparative_analysis — X vs Y, head-to-headgrant_background — narrative background + gap for a proposalpython scripts/research_state.py init \
--question "<restated question>" \
--archetype literature_review \
--output research_state.json
When in doubt about archetype, ask the user. The choice shapes everything downstream.
Run searches across all available sources in parallel. OpenAlex is primary; the others fill gaps.
# Primary (no API key, always available)
python scripts/search_openalex.py --query "<cluster 1>" --limit 50 --state research_state.json
python scripts/search_openalex.py --query "<cluster 2>" --limit 50 --state research_state.json
# Domain-specific (use when relevant)
python scripts/search_arxiv.py --query "<cluster>" --limit 50 --state research_state.json # CS/ML/physics
python scripts/search_pubmed.py --query "<cluster>" --limit 50 --state research_state.json # biomedical
python scripts/search_crossref.py --query "<cluster>" --limit 50 --state research_state.json # DOI-backed metadata
# Dedupe across sources (DOI-first, title-similarity fallback)
python scripts/dedupe_papers.py --state research_state.json
MCP enrichment (optional, run if available): call mcp__asta__search_papers_by_relevance and mcp__asta__snippet_search and feed results via scripts/research_state.py ingest. If the MCP call errors or times out, do not retry — move on.
Iterate. Read the state file. Are there keyword gaps? Are there authors appearing 3+ times whose other work you haven't pulled? Run another round. Stop when saturation hits — every source, not just the last one queried:
python scripts/research_state.py saturation --state research_state.json
# Returns { "per_source": {...}, "overall_saturated": true/false, ... }
overall_saturated is true only when every queried source has run at least --min-rounds (default 2) rounds AND each is individually below the new-paper percentage and new-citation thresholds. A source that has been queried only once cannot be declared saturated, which rules out the failure mode where a single quiet source falsely ends discovery. Use --source openalex to check one source in isolation.
Rank the deduplicated corpus and pick the top-N for deep reading.
python scripts/rank_papers.py \
--state research_state.json \
--question "<phase 0 question>" \
--alpha 0.4 --beta 0.3 --gamma 0.2 --delta 0.1 \
--top 20
The formula is transparent — the script prints it and writes the components to state so the report can cite its own methodology:
score = α·relevance + β·log10(citations+1)/3 + γ·recency_decay(half-life=5yr) + δ·venue_prior
Defaults target a literature review. For a scoping review prefer higher α (relevance) and lower β (citations). For a systematic review of a narrow question, lower α and higher β.
Write the top-N selection to state:
python scripts/research_state.py select --state research_state.json --top 20
For each paper in the top-N: get the best available full text, extract evidence, attach to state.
--doi when the paper's DOI is known (resolves and extracts in one step), or --input/--url for direct paths:
python scripts/extract_pdf.py --doi 10.1038/s41586-020-2649-2 --output paper.txt
python scripts/extract_pdf.py --input paper.pdf --output paper.txt
DOI mode uses the paper-fetch skill (5-source OA chain) if installed, otherwise falls back to Unpaywall. The output envelope includes fetch_meta with title, authors, year, and source when using --doi.question_or_hypothesismethod (one sentence)key_findings (3-5 bullets, each with a page/section anchor)limitationsrelevance_to_question (how this paper moves the answer)Abstract-only papers are marked depth: shallow in state — they can appear in the report but should not be the only source for any claim.
Take the top 5-10 highest-ranked papers and expand the graph.
# Preview the request count first — this is the most expensive command
python scripts/build_citation_graph.py \
--state research_state.json \
--seed-top 8 --direction both --depth 1 --dry-run
# Run with an idempotency key so a retry after a network blip is free
python scripts/build_citation_graph.py \
--state research_state.json \
--seed-top 8 --direction both --depth 1 \
--idempotency-key "chase-$(date -u +%Y%m%dT%H%M)"
The script pulls backward references (what did this paper cite?) and forward citations (who cited this paper?), deduplicates against existing state, and writes new candidate papers with discovered_via: citation_chase. Run rank + deep read again on any new high-scoring additions.
Idempotency. When --idempotency-key <k> is set, the first successful run writes {response, signature} to .scholar_cache/<hash>.json. A retried run with the same key replays the cached response without re-hitting OpenAlex or re-mutating state. Reusing the same key with different arguments returns idempotency_key_mismatch rather than silently serving stale data. Cache directory: SCHOLAR_CACHE_DIR env var, default .scholar_cache/.
Special case — a highly cited paper has never been challenged. If rank says a paper is top-3 by citations but no critiques appear in the corpus, search explicitly for "<first author> <year>" critique OR limitations OR reanalysis OR failed replication. This is the confirmation-bias backstop.
No scripts here — this is where the agent earns its keep. Cluster and structure:
This is not optional. Load assets/prompts/self_critique.md and run the full checklist against your draft (still unpublished). The checklist covers:
[^id] anchor?)Write findings to research_state.json under self_critique and fix blockers before Phase 7. Findings go into the report appendix verbatim — the reader deserves to see what the research process doubted itself about.
Render the chosen archetype template from assets/templates/, filling from state:
# Export bibliography in the user's preferred format
python scripts/export_bibtex.py --state research_state.json --format bibtex --output refs.bib
python scripts/export_bibtex.py --state research_state.json --format csl-json --output refs.json
The report body uses [^id] anchors (the paper id from state). The bibliography section at the bottom lists each cited paper with DOI/URL. Any claim missing an anchor must be removed or cited.
Save path convention: reports/<slug>_<YYYYMMDD>.md. The skill does not write outside the working directory unless the user specifies a path.
| Archetype | When to use | Primary output shape |
|---|---|---|
literature_review | User wants to know what's established about a topic | Thematic sections + synthesis + gap |
systematic_review | Narrow question, many studies, need rigorous comparison | PRISMA-lite flow + extraction table + pooled findings |
scoping_review | Broad topic, "what has been studied?" | Coverage map + methods inventory + research gap |
comparative_analysis | "A vs B" — methods, models, approaches | Axes of comparison + per-axis verdict + recommendation |
grant_background | Narrative for a proposal introduction | Problem significance + what's known + what's missing + why our approach |
Templates live in assets/templates/<archetype>.md. Load only the one you need.
| Script | Purpose |
|---|---|
check_update.py | Phase 0 Step 0 — fast-forward the skill against its origin; never blocks the workflow. |
research_state.py | Init, read, write, query the state file. Central to every phase. |
search_openalex.py | Primary search (no key, 240M works, citation counts). |
search_arxiv.py | arXiv API — preprints and CS/ML/physics. |
search_crossref.py | Crossref REST — authoritative DOI metadata. |
search_pubmed.py | NCBI E-utilities — biomedical corpus with MeSH. |
dedupe_papers.py | DOI normalization + title similarity merging across sources. |
rank_papers.py | Transparent scoring formula. Prints the formula and per-paper components. |
build_citation_graph.py | Forward/backward snowballing via OpenAlex. |
extract_pdf.py | Full-text extraction (pypdf). Accepts --input, --url, or --doi. DOI mode resolves via paper-fetch skill if installed, falls back to Unpaywall. Safe on scanned PDFs (skips, emits warning). |
export_bibtex.py | BibTeX / CSL-JSON / RIS export from state. |
All scripts accept --help, --schema, emit a structured JSON envelope on stdout, and use research_state.json as the single source of truth. Every script is idempotent on the state file (network-layer idempotency is P1 work).
Every script prints exactly one JSON envelope to stdout and exits with a code from the stable vocabulary below. No prose is ever mixed into stdout; diagnostics go to stderr.
Success envelope:
{ "ok": true, "data": { ... } }
Failure envelope:
{
"ok": false,
"error": {
"code": "snake_case_routing_key",
"message": "human sentence",
"retryable": true,
"...extra context fields...": "..."
}
}
Exit codes:
| Code | Meaning |
|---|---|
0 | success |
1 | runtime error (e.g. malformed upstream response, missing dependency) |
2 | upstream / network error (retryable) |
3 | validation error (bad input) |
4 | state error (missing, corrupt, or schema mismatch) |
Schema introspection. Every script supports --schema, which prints its full parameter schema (types, defaults, choices, required flags, subcommands where applicable) as JSON and exits 0. An agent discovering an unfamiliar script should run --schema before --help — it is machine-parseable and covers everything --help does.
python scripts/search_openalex.py --schema
python scripts/research_state.py --schema # includes every subcommand
Export bibliography exception. export_bibtex.py without --output writes raw BibTeX/RIS/CSL text to stdout for pipe compatibility (export_bibtex.py --format bibtex > refs.bib). Agents that need a structured response should always pass --output — that path returns {"ok": true, "data": {"output": "...", "format": "bibtex", "count": N}}.
Trust-boundary configuration — set once by the human or orchestrator. CLI flags override where present.
| Variable | Used by | Purpose |
|---|---|---|
SCHOLAR_STATE_PATH | every script that takes --state | Default path to research_state.json |
SCHOLAR_MAILTO | search_openalex.py, search_crossref.py, build_citation_graph.py | Polite-pool email for OpenAlex / Crossref — higher rate limits |
NCBI_API_KEY | search_pubmed.py | NCBI E-utilities API key — higher rate limits |
SCHOLAR_CACHE_DIR | build_citation_graph.py (any command that takes --idempotency-key) | Cache directory for idempotent-retry responses; default .scholar_cache/ in cwd |
PAPER_FETCH_SCRIPT | extract_pdf.py | Path to paper-fetch's fetch.py. If unset, auto-discovers across all known skill install paths (Claude Code, OpenCode, OpenClaw, Hermes, ~/.agents). If not found, falls back to Unpaywall |
SCHOLAR_SKIP_UPDATE_CHECK | check_update.py | Set to any non-empty value to pin the current version and skip Phase 0 Step 0's auto-update |
Agents should never set these themselves. They belong in the shell profile, a systemd unit, or the orchestrator's env injection.
{
"schema_version": 1,
"question": "...",
"archetype": "literature_review",
"phase": 3,
"created_at": "...",
"updated_at": "...",
"queries": [{"source": "openalex", "query": "...", "hits": 42, "new": 30, "round": 1}],
"papers": {
"doi:10.1038/nature12373": {
"id": "doi:10.1038/nature12373",
"title": "...",
"authors": ["..."],
"year": 2013,
"venue": "Nature",
"citations": 523,
"abstract": "...",
"source": ["openalex", "crossref"],
"score": 0.81,
"score_components": {"relevance": 0.9, "citations": 0.8, "recency": 0.6, "venue": 1.0},
"selected": true,
"depth": "full",
"evidence": {"method": "...", "findings": ["..."], "limitations": "..."},
"discovered_via": "search"
}
},
"themes": [{"name": "...", "paper_ids": ["..."]}],
"tensions": [{"topic": "...", "sides": [{"position": "...", "paper_ids": ["..."]}]}],
"self_critique": {"findings": [], "resolved": [], "appendix": "..."},
"report_path": "reports/slug_20260411.md"
}
See scripts/research_state.py --help for the full schema.
Each phase has a gate. Do not advance until the gate passes.
| Phase | Gate |
|---|---|
| 0 → 1 | Question restated, archetype chosen, ≥3 keyword clusters, state initialized |
| 1 → 2 | Saturation hit on primary source AND ≥3 sources consulted |
| 2 → 3 | Top-N selected with score components recorded |
| 3 → 4 | ≥80% of top-N have depth: full (rest explicitly marked shallow) |
| 4 → 5 | Citation graph expanded ≥1 depth on top 5 seeds |
| 5 → 6 | ≥3 themes defined, ≥1 tension documented (or explicit "no tensions found") |
| 6 → 7 | Self-critique appendix written, all unanchored claims resolved |
If the session has Semantic Scholar (asta) or Brave Search MCP tools available, use them as enrichment:
mcp__asta__search_papers_by_relevance — good for dense relevance rankingmcp__asta__get_citations — lighter weight than citation graph for a few seed papersmcp__asta__snippet_search — grep-like search across abstractsTreat MCP tools as unreliable by design — they may timeout or be unavailable. Never place a phase-critical step behind an MCP call. Scripts are the spine; MCP is the skin.
references/pitfalls.md for detail)[^id] pointing to a paper in state.User: Give me a deep research report on CRISPR base editing for Duchenne muscular dystrophy.
Agent:
[Phase 0] Restating: "What is the current state and evidence for CRISPR base editing
as a therapeutic approach for Duchenne muscular dystrophy (DMD)?"
Archetype: literature_review (default; confirmed — broad overview requested)
Clusters:
- ("base editing" OR "adenine base editor" OR "cytosine base editor" OR ABE OR CBE)
- ("Duchenne muscular dystrophy" OR DMD OR dystrophin)
- (in vivo OR AAV OR "muscle delivery")
→ research_state.json initialized
[Phase 1] Running OpenAlex + PubMed + arXiv + Crossref across 3 clusters...
Round 1: 187 hits, 142 unique. Round 2: 94 hits, 31 new.
Saturation check: new=11%, max_new_citations=23 → SATURATED
143 unique papers in state.
[Phase 2] Ranking with default weights (literature review)...
Top 20 selected. Score components written to state.
[Phase 3] Fetching full text... 17/20 full, 3 abstract-only (flagged shallow).
Evidence extraction complete.
[Phase 4] Citation chasing on top 8 seeds, depth 1.
Added 24 candidates, 6 re-scored into top 20.
[Phase 5] Themes: (a) delivery platforms, (b) editing efficiency, (c) off-target/safety,
(d) pre-clinical outcomes, (e) clinical translation barriers.
Tensions: AAV serotype optimality (Theme a) — 3 papers disagree.
[Phase 6] Self-critique flagged 2 single-source claims and a recency gap
(no 2025 paper in top 20). Re-ran focused search; added 4 papers.
[Phase 7] Rendering literature_review template...
Report: reports/crispr-base-editing-dmd_20260411.md
Bibliography: reports/crispr-base-editing-dmd_20260411.bib (84 refs)
Modular documentation, loaded only when needed:
references/search_strategies.md — Boolean clusters, PICO, snowballing, saturation mathreferences/source_selection.md — which database for which questionreferences/quality_assessment.md — CRAAP, journal tier, retraction check, preprint handlingreferences/report_templates.md — the 5 archetypes with section-by-section guidancereferences/pitfalls.md — long-form version of the pitfalls list with examples