Use when the user wants to download a paper PDF from a DOI, title, or URL via legal open-access sources. Tries Unpaywall, arXiv, bioRxiv/medRxiv, PubMed Central, and Semantic Scholar in order. Never uses Sci-Hub or paywall bypass.
Fetch the legal open-access PDF for a paper given a DOI (or title). Tries multiple OA sources in priority order and stops at the first hit.
https://api.unpaywall.org/v2/{doi}?email=$UNPAYWALL_EMAIL, read best_oa_location.url_for_pdfhttps://api.semanticscholar.org/graph/v1/paper/DOI:{doi}?fields=openAccessPdf,externalIdsexternalIds.ArXiv present, https://arxiv.org/pdf/{arxiv_id}.pdfhttps://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/10.1101, use the claude_ai_bioRxiv MCP get_preprint tool, then fetch the PDF URLIf only a title is given, resolve to a DOI first via Semantic Scholar search_paper_by_title (asta MCP) or Crossref.
python scripts/fetch.py <DOI> [--out DIR] [--dry-run] [--format json|text]
| Flag | Default | Description |
|---|---|---|
doi | — | DOI to fetch (positional, e.g. 10.1038/s41586-020-2649-2) |
--batch FILE | — | File with one DOI per line for bulk download |
--out DIR | pdfs | Output directory |
--dry-run | off | Resolve sources without downloading; preview the PDF URL and filename |
--format | json | Output format: json (for agents) or text (for humans) |
stdout emits a single JSON object (when --format json):
Success:
{
"ok": true,
"data": {
"results": [
{
"doi": "10.1038/s41586-020-2649-2",
"success": true,
"source": "unpaywall",
"pdf_url": "https://...",
"file": "pdfs/Author_2020_Title.pdf",
"meta": {"title": "...", "year": 2020, "author": "Smith"}
}
],
"summary": {"total": 1, "succeeded": 1, "failed": 0}
}
}
Failure:
{
"ok": false,
"error": {
"code": "auth_missing",
"message": "Set UNPAYWALL_EMAIL env var to your contact email",
"retryable": false,
"retry_after_auth": true
}
}
stderr carries human-readable progress diagnostics (source attempts, download status).
| Code | Meaning |
|---|---|
0 | All DOIs resolved successfully |
1 | Runtime error (some DOIs failed, network issues) |
2 | Auth error (UNPAYWALL_EMAIL not set) |
3 | Validation error (bad arguments, missing input) |
| Code | Meaning | Retryable |
|---|---|---|
auth_missing | UNPAYWALL_EMAIL not set | No (set env var first) |
validation_error | Bad arguments or empty input | No |
not_found | No open-access PDF found | No |
download_failed | Source found but download failed | Yes |
# Single DOI (JSON output for agents)
python scripts/fetch.py 10.1038/s41586-020-2649-2
# Dry-run preview
python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run
# Human-readable output
python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text
# Batch download
python scripts/fetch.py --batch dois.txt --out ./papers
export [email protected] (e.g. in ~/.zshrc). The script exits with code 2 if it's not set../pdfs/. Filenames: {first_author}_{year}_{short_title}.pdf.