Downloads academic papers via Sci-Hub, adds them to Zotero with full metadata, indexes them into the local RAG vector store, and optionally reads them via the reading skill. TRIGGER this skill whenever the user asks to: download a paper, get a PDF, acquire a source, add a paper to the library, "get me this paper", "download this DOI", "add this to Zotero", "I need the full text of X", "can you find the PDF for X", or any request that involves obtaining a paper that is not yet in the local library. Also trigger when the deep-research skill identifies papers to acquire, when a citation is needed but the paper is not indexed, or when the user provides a DOI, title, or paper ID and wants it in the system. If in doubt, trigger -- a failed download is cheap, a missing source is expensive.
Acquire papers, add to Zotero, index for RAG, and read. Every acquired paper MUST be added to Zotero -- no exceptions.
The deep-research skill identifies papers worth reading. The reading skill
evaluates them. But between discovery and reading there is a gap: the paper
must be downloaded, added to Zotero (so it appears in ref.bib via BBT
auto-export), and indexed into the RAG vector store (so it becomes searchable).
This skill owns that entire pipeline.
Every acquisition follows this exact sequence:
Use the RAG CLI to download the paper:
r2 rag lit-download "IDENTIFIER" \
--type TYPE --title "PAPER TITLE"
doipmidtitleDOI is always preferred. If you only have a title, search Semantic Scholar first to find the DOI:
r2 rag lit-search "TITLE" --focus broad -n 5
Then use the DOI from the search results.
The lit-download command automatically:
Check the output for:
Zotero: added (ITEM_KEY) = successZotero: ...error... = partial failure -- item may exist without PDFIf Zotero fails, use the standalone script as fallback:
python .claude/scripts/zotero_add.py CITEKEY
Zotero addition is mandatory. Every downloaded paper must appear in Zotero
so that BBT auto-exports it to ref.bib. Without this, the paper cannot be
cited in the manuscript.
The lit-download command auto-indexes into RAG after download. Check the
output for:
Indexed dir__FILENAME (N chunks) = successSkipped (exists) = already indexed (OK)Total entries: 0 = indexing failed (see Troubleshooting)Verify the paper is searchable:
r2 rag search "KEY TERM FROM PAPER" -n 3
After indexing, the paper is available for full-text RAG queries. If the paper was acquired for a specific purpose (e.g., to support a claim in the manuscript), read it immediately using the reading skill:
r2 rag query "SPECIFIC QUESTION" --citekey dir__FILENAME
Or trigger the reading skill for a comprehensive evaluation.
For multiple papers (e.g., from a deep-research "Papers to Index" table):
r2 rag lit-download-batch \
'[{"id": "10.xxxx/yyyy", "title": "Paper Title"}, {"id": "10.xxxx/zzzz", "title": "Another Paper"}]' \
--auto-index
Each paper in the batch is automatically added to Zotero. The --auto-index
flag indexes all downloaded PDFs at once.
Priority order: Download HIGH-priority papers first. Ask the user before downloading MEDIUM/LOW papers if the list is long (>5 papers).
When you have a paper title but no DOI, find it before downloading:
lit-search "TITLE" --focus broad -n 5lit-search "TITLE" --source oa -n 5curl -s "https://api.crossref.org/works?query=TITLE&rows=3"Always prefer DOI over title for downloads -- DOI-based downloads have higher success rates and better Zotero metadata.
Acquisition tries four strategies in order. If one tier fails, the next tier activates automatically. Do not flag a paper as failed until all four tiers have been exhausted.
lit-download)The lit-download command tries three Sci-Hub strategies internally:
The Lightpanda fallback requires the binary at .venv/bin/lightpanda (already
installed). It uses fetch --dump html --with_frames mode to render pages with
full JavaScript execution, then extracts the PDF URL from the rendered DOM.
If Sci-Hub fails (all mirrors exhausted), search the web for an open-access version. Many papers are freely available on author pages, working paper series, or institutional repositories.
Search strategy (use WebSearch tool):
"PAPER TITLE" filetype:pdf
If that fails, try broader queries:
"FIRST AUTHOR LAST NAME" "SHORT TITLE" pdf
"PAPER TITLE" site:nber.org OR site:ssrn.com OR site:repec.org
Common open-access sources (prioritize in this order):
Download the PDF (use WebFetch or curl):
curl -L -o ".claude/rag/pdfs/FILENAME.pdf" "PDF_URL"
Verify the PDF is valid (not an HTML error page or login wall):
file ".claude/rag/pdfs/FILENAME.pdf" # should say "PDF document"
head -c 5 ".claude/rag/pdfs/FILENAME.pdf" # should start with %PDF-
After successful web download, continue to Step 2 (Zotero) and Step 3 (RAG indexing) as normal. The paper still needs Zotero metadata and RAG indexing regardless of how it was obtained.
To add the downloaded PDF to Zotero and index into RAG:
# Add to Zotero with metadata
python .claude/scripts/zotero_add.py --doi "DOI" --pdf ".claude/rag/pdfs/FILENAME.pdf"
# Index into RAG
r2 rag index --source dir --pdf-dir .claude/rag/pdfs
Some papers have publisher-provided open-access links in Semantic Scholar:
r2 rag lit-paper "S2_PAPER_ID"
Check the output for an openAccessPdf URL. If present, download directly.
Only after Tiers 1-3 all fail, flag the paper for the user with:
RAG_ZOTERO_API_KEY and RAG_ZOTERO_LIBRARY_ID in .envpython .claude/scripts/zotero_add.py CITEKEYpdf_dir was emptyr2 rag index --source dir --pdf-dir .claude/rag/pdfsr2 rag remove dir__FILENAME && r2 rag index --force --source dir --pdf-dir .claude/rag/pdfszotero_add.py for missing entries.--filename flag accepts citekey-style names (e.g., reny2021_apsr).