Search and analyze biomedical literature from PubMed using the free E-utilities API. Use when researching medical topics, discovering clinical papers, fetching article metadata by PMID, performing deep paper analysis, or downloading open-access PDFs from PubMed Central. Triggers: pubmed search, search biomedical literature, find medical papers, PMID lookup, pubmed metadata, clinical literature search, biomedical research, life sciences papers, PMC download, ncbi search.
Search and analyze biomedical literature from PubMed using the free NCBI E-utilities API.
triage-papertriage-papersemantic-scholar MCP or PubTator MCP is configured — prefer the MCP; it returns structured data with no rate-limit riskgoogle-scholar-search or semantic-scholar-search/tmp/<topic>-candidates.json — reuse itWhen available, prefer the PubTator MCP server over this script:
{
"mcpServers": {
"pubtator": {
"type": "stdio",
"command": "uvx",
"args": ["pubtator-mcp-server"]
}
}
}
Search is discovery, not analysis. The goal is a structured candidate list.
ALWAYS check for a pubtator or pubmed MCP before running the script. If configured and reachable, prefer it.
See setup-and-troubleshooting.md for venv creation and dependency installation.
Activate the venv, then choose the appropriate subcommand. You may optionally add --show-abstract to keyword searches for a richer preview.
# Basic keyword search
./scripts/pubmed_search.py search --keywords "CRISPR gene editing" --results 10
# Advanced: filter by author, journal, and date range
./scripts/pubmed_search.py search --term "cancer immunotherapy" --author "Smith" \
--journal "Nature" --start-date "2021" --end-date "2024" --results 20
# Fetch metadata for a known PMID
./scripts/pubmed_search.py metadata --pmid "33303479" --format json
# Deep paper analysis
./scripts/pubmed_search.py analyze --pmid "33303479" --output analysis.md
# Download open-access PDF
./scripts/pubmed_search.py download --pmid "33303479" --output-dir ./papers/
# Export candidate list to JSON
./scripts/pubmed_search.py search --keywords "Alzheimer disease biomarkers" \
--results 50 --format json --output /tmp/candidates.json
If the script returns HTTP 429, wait 30 s and retry once:
sleep 30 && ./scripts/pubmed_search.py search --keywords "<topic>" --results 10
NEVER retry in a tight loop. If still failing, set PUBMED_API_KEY in the environment.
NEVER triage automatically — ALWAYS confirm with the user first:
Found N results. Would you like to triage any of these with
triage-paper?
WHY: Discovery and triage are separate quality gates. Auto-triaging bypasses user review.
BAD Pass every result to triage-paper immediately. → GOOD Present the list; wait for the user to choose.
WHY: Repeated rapid retries worsen the block and extend the cooldown period.
BAD Loop search until it succeeds. → GOOD Retry once after 30 s; switch to MCP or API key on second failure.
WHY: Many PMC articles are not open access; the download command will fail or return a redirect link.
BAD Call download for every PMID. → GOOD Check PMC availability and open-access status before downloading.
WHY: The Python script is the fragile fallback. Skipping the check needlessly risks rate-limiting.
BAD Invoke the script without checking for a PubTator or PubMed MCP. → GOOD ALWAYS check MCP availability first; only fall back to the script if no MCP is configured.
WHY: API keys committed to source are a production security risk and will be rotated or revoked.
BAD Set api_key = "abc123" inside the script. → GOOD ALWAYS use environment variables (PUBMED_API_KEY) or a .env file that is gitignored.
WHY: PubMed metadata contains only the abstract. Deep analysis based solely on abstracts is incomplete and a pitfall for research quality.
BAD Mark a paper as "fully analysed" from analyze output alone. → GOOD Qualify analysis as "abstract-based" and recommend obtaining the full text for production use.