Search PubMed and bioRxiv for bioinformatics literature, synthesise results into a structured report, and build a citation graph — all locally, with a reproducibility bundle.
You are Lit Synthesizer, a specialised ClawBio agent for biomedical literature discovery and synthesis. Your role is to search PubMed and bioRxiv, summarise retrieved papers, and build a citation graph — all locally with a reproducibility bundle.
Fire this skill when the user says any of:
Do NOT fire when:
vcf-annotator)pharmgx-reporter)Without it: A researcher must manually search PubMed, download abstracts, read each one, spot connections across papers, and format everything by hand. This can take hours for a single topic.
With it: One command searches both PubMed and bioRxiv, summarises abstracts, identifies recurring themes, builds a citation graph, and outputs a formatted report with a reproducibility bundle — in under 30 seconds.
Why ClawBio: A general LLM will hallucinate paper titles, fabricate authors, and invent DOIs. This skill uses live API calls to real databases, so every paper it returns is real and verifiable.
commands.sh, environment.yml, SHA-256 checksumsThis skill searches literature and synthesises results. It does not provide clinical recommendations, annotate variants, or replace a systematic review.
| Format | Description | Example |
|---|---|---|
| Free-text query | Any PubMed-compatible search string | "CRISPR off-target effects 2024" |
| Boolean query | PubMed boolean syntax | "BRCA1 AND breast cancer AND review" |
esearch → get PMIDs, then efetch → get detailsreport.md with paper summaries, citation graph, and reproducibility bundle# Standard usage
python skills/lit-synthesizer/lit_synthesizer.py \
--query "CRISPR off-target effects" \
--output report/
# Limit results
python skills/lit-synthesizer/lit_synthesizer.py \
--query "single cell RNA sequencing" \
--max 5 \
--output report/
# Demo mode (no network needed)
python skills/lit-synthesizer/lit_synthesizer.py \
--demo --output /tmp/demo
# Via ClawBio runner
python clawbio.py run lit-synthesizer --query "BRCA1 variants" --output report/
python clawbio.py run lit-synthesizer --demo
python clawbio.py run lit-synthesizer --demo
Expected output: A report covering 3 demo papers on CRISPR genome editing, with a citation graph of 3 nodes and 3 edges, plus a full reproducibility bundle.
esearch): POST query to NCBI, receive list of PMIDsefetch): POST PMIDs, parse returned XML for title/authors/abstract/DOIhttps://api.biorxiv.org/details/biorxiv/{date_range}/0/json, filter by keywordscitations fieldKey parameters:
--max)# 🦖 ClawBio Lit Synthesizer Report
**Query**: `CRISPR off-target effects`
**Date**: 2026-04-12 10:30 UTC
**Sources**: PubMed (3 results) · bioRxiv (1 result)
**Total papers**: 4
---
## Summary
Across 4 retrieved papers, recurring themes include: **crispr**, **off-target**,
**base editing**, **cas9**, **guide rna**, **variant**.
The literature spans 2024 to 2025.
---
## Papers
### 1. CRISPR-Cas9 off-target effects: detection and mitigation strategies
| Field | Value |
|-------|-------|
| Source | PubMed |
| Authors | Zhang Y, Li X, Wang M |
| Journal | Nature Biotechnology |
| Year | 2024 |
| DOI | 10.1038/nbt.2024.001 |
**Abstract**: CRISPR-Cas9 genome editing tools have revolutionised molecular
biology. However, off-target cleavage remains a major safety concern...
output_directory/
├── report.md # Full synthesis report
├── results.json # All papers as structured JSON
├── citation_graph.json # Node-edge citation graph
├── tables/
│ └── papers.csv # Tabular paper list
└── reproducibility/
├── commands.sh # Exact commands to reproduce
├── environment.yml # Conda/pip environment
└── checksums.sha256 # SHA-256 of all output files
Required:
biopython >= 1.83 — Entrez utilities wrapper (optional; skill also works with pure urllib)urllib, xml.etree, json, csv, hashlibOptional:
matplotlib — for future citation graph visualisationnetworkx — for advanced graph analysisbioRxiv API returns date-ordered results, not keyword-ranked: The skill filters by keyword locally after fetching. For very broad queries this may return zero bioRxiv results. Use a specific query to improve recall.
NCBI E-utilities rate limit: Without an API key you are limited to 3 requests/second. The skill enforces a 0.34 s sleep. Do NOT remove this sleep or you will receive HTTP 429 errors.
Abstract truncation in report: Abstracts are capped at 400 characters in
the report for readability. Full text is in results.json.
Citation graph only covers internal cross-references: The graph only shows edges between papers that were also retrieved in the same search. It is not a global citation network.
The agent (LLM) dispatches the query and explains results. The skill (Python) executes the API calls and generates files. The agent must NOT invent paper titles, authors, or DOIs.
Trigger conditions: route here when the user mentions:
pubmed, biorxiv, literature, papers, articles, citations, reviewChaining partners:
pharmgx-reporter: A lit search on a drug gene (e.g. CYP2D6) can precede a PharmGx reportsemantic-sim: Lit Synthesizer output can feed into the Semantic Similarity Index for topic clusteringskills/_deprecated/ if NCBI discontinues E-utilities free tier