Query OpenAlex REST API for scholarly literature — 250M+ works, authors, institutions, journals, and concepts. Search by title/abstract keywords, author, DOI, ORCID, or OpenAlex ID. Filter by year, open access status, citation count, or field. Retrieve citations, references, and author disambiguation. Free, no authentication required. For PubMed biomedical search use pubmed-database; for bioRxiv preprints use biorxiv-database.
OpenAlex is a free, open-access index of 250M+ scholarly works, 90M+ authors, 110,000+ journals, and 10,000+ institutions. It succeeds Microsoft Academic Graph and provides rich metadata: abstracts, open-access URLs, citation counts, referenced works, author disambiguated IDs (ORCID), and concept tags. The REST API requires no authentication for up to 100,000 requests/day; a polite pool (email parameter) gives priority processing.
pubmed-database; for bioRxiv preprints use biorxiv-databasepip install requests pandas
[email protected] to join the polite poolimport requests
BASE = "https://api.openalex.org"
r = requests.get(f"{BASE}/works",
params={"search": "CRISPR gene editing",
"filter": "publication_year:2023",
"per_page": 5,
"mailto": "[email protected]"})
r.raise_for_status()
data = r.json()
print(f"Total: {data['meta']['count']}")
for work in data["results"][:3]:
print(f" {work['title'][:80]} ({work['publication_year']}) cites={work['cited_by_count']}")
| Use Case | Endpoint / Pattern |
|---|---|
| Search works by keyword/filter | GET /works?search=...&filter=... |
| Lookup single work by DOI | GET /works/https://doi.org/{doi} |
| Search authors, resolve ORCID | GET /authors?search=... |
| Get all papers by author (ORCID) | GET /works?filter=author.orcid:{orcid} |
| Get citation network | GET /works/{id}?select=referenced_works |
| Concept/topic trend analysis | GET /works?filter=concepts.id:...&group_by=publication_year |
| Batch metadata by DOIs | Loop per DOI or batch by `filter=openalex_id:W1 |
For complete runnable code (6 query types, systematic search, collaboration network, batch DOI lookup, country analysis), see references/api_queries.md.
OpenAlex stores abstracts as inverted indexes (word → positions) due to copyright restrictions. Reconstruct with:
inv = work.get("abstract_inverted_index")
if inv:
words = {pos: w for w, ps in inv.items() for pos in ps}
text = " ".join(words[i] for i in sorted(words))
Use cursor="*" to start, then read next_cursor from each response. Max 200 per page, up to 10,000 results via cursor pagination.
| Parameter | Default | Range / Options | Effect |
|---|---|---|---|
search | — | text string | Full-text search across title+abstract |
filter | — | field:value,field:value | Structured filters (AND logic) |
per_page | 25 | 1–200 | Results per page |
cursor | "*" | cursor string | Cursor-based pagination |
sort | relevance | cited_by_count:desc, publication_year:desc | Result ordering |
select | all fields | comma-separated field names | Limit response fields (faster) |
group_by | — | field name | Aggregate counts by field |
mailto | — | email address | Polite pool (prioritized processing) |
mailto: Joins polite pool for priority processing without extra rate throttling.select for large paginations: e.g., select=id,doi,title,cited_by_count to reduce response size.abstract_inverted_index is not None before reconstructing — not all works have abstracts.| Problem | Cause | Solution |
|---|---|---|
HTTP 429 Too Many Requests | Rate limit exceeded | Add time.sleep(0.15); use polite pool |
Empty abstract_inverted_index | No abstract | Check for None before reconstructing |
| Cursor pagination returns duplicates | Cursor expired | Re-start with cursor="*" |
| DOI lookup 404 | DOI not indexed in OpenAlex | Try title search instead |
| Filter returns 0 | Syntax error | Use field:value with no spaces; verify field names |
Stale cited_by_count | Counts refresh periodically | Use for trends, not exact figures |
pubmed-database — Biomedical literature with MeSH controlled vocabularyliterature-review — Systematic review methodology and PRISMA frameworkscientific-brainstorming — Hypothesis generation using literature as input