Search arXiv for preprints in physics, math, CS, quantitative biology, quantitative finance, statistics, electrical engineering, economics. Use when: (1) finding preprints by topic, (2) searching by author, (3) browsing arXiv categories, (4) getting paper metadata/abstracts. NOT for: published journal articles (use crossref-search), biomedical (use pubmed-search).
Search arXiv preprints via public API. Covers physics, math, CS, q-bio, q-fin, statistics, electrical engineering, and economics.
curl -s "http://export.arxiv.org/api/query?search_query=all:transformer+attention&start=0&max_results=5"
Parameters: search_query= (required), id_list= (direct lookup by arXiv ID),
start= (pagination offset), max_results= (default 10, max 30000),
sortBy=relevance|lastUpdatedDate|submittedDate, sortOrder=ascending|descending.
Field prefixes: ti: title, au: author, abs: abstract, co: comment,
jr: journal ref, cat: category, all: all fields.
Boolean: AND, , . Example:
ORANDNOTcurl -s "http://export.arxiv.org/api/query?search_query=au:bengio+AND+cat:cs.LG+AND+ti:attention&max_results=10"
Physics: astro-ph (.CO/.EP/.GA/.HE/.IM/.SR), cond-mat (.dis-nn/.mes-hall/.mtrl-sci/.soft/.stat-mech/.str-el/.supr-con), hep-ex, hep-lat, hep-ph, hep-th, quant-ph, gr-qc, nucl-ex, nucl-th
CS: cs.AI, cs.CL (NLP), cs.CV, cs.LG (ML), cs.CR, cs.DB, cs.DS, cs.SE, cs.RO
Math: math.AG, math.AP, math.CO, math.PR, math.ST
Other: q-bio (.BM/.CB/.GN/.MN/.NC/.PE/.QM/.SC/.TO), q-fin (.CP/.EC/.GN/.MF/.PM/.PR/.RM/.ST/.TR), stat (.AP/.CO/.ME/.ML/.OT/.TH), eess (.AS/.IV/.SP/.SY), econ (.EM/.GN/.TH)
The API returns Atom XML. Parse with Python:
curl -s "http://export.arxiv.org/api/query?search_query=ti:large+language+model&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for entry in root.findall('a:entry', ns):
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
aid = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
pub = entry.find('a:published', ns).text[:10]
authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
print(f'[{aid}] {pub} | {title}')
print(f' Authors: {authors}\n')
"
# By ID
curl -s "http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971"
# Pagination
curl -s "http://export.arxiv.org/api/query?search_query=cat:cs.AI&start=0&max_results=25&sortBy=submittedDate&sortOrder=descending"
curl -s "http://export.arxiv.org/api/query?search_query=cat:cs.AI&start=25&max_results=25&sortBy=submittedDate&sortOrder=descending"
No official limit, but keep to 1 request per 3 seconds for bulk queries. For large-scale harvesting, use the OAI-PMH bulk access endpoint instead.
sortBy=submittedDate&sortOrder=descending for latest papers.cat: with keyword searches for targeted results.opensearch:totalResults in the response for total match count./abs/ with /pdf/ in the paper URL.id_list for direct lookups (faster and more reliable).+ in query terms.NEVER fabricate results from training data. Every paper title, author, DOI, PMID, citation count, and metadata detail presented to the user MUST come from an actual API response in this conversation. If the API returns no results or partial data, report exactly what was returned. Do not "fill in" missing details from memory.