Research literature assistant for searching, retrieving, and managing academic papers from arXiv.
Tools for searching, retrieving, downloading, extracting text, summarizing, and managing academic research papers from arXiv.
Use this skill immediately when the user asks any of:
Search arXiv for papers by topic.
paper_search(query="ti:transformer AND au:hinton", max_results=5, sort_by="relevance")
Query syntax examples:
ti:transformer - title contains "transformer"au:hinton - author is "hinton"abs:reinforcement learning - abstract contains "reinforcement learning"cat:cs.CL - category is "cs.CL"all:attention - search everywhereGet detailed information about a specific paper.
paper_get(paper_id="2401.12345v2")
paper_get(url="https://arxiv.org/abs/2401.12345")
Download arXiv paper PDF to local storage.
paper_download_pdf(paper_id="2401.12345v2")
paper_download_pdf(url="https://arxiv.org/abs/2401.12345")
paper_download_pdf(paper_id="2401.12345", overwrite=True)
Note: If PDF already exists locally and overwrite=False, will reuse existing file.
Extract text content from a local PDF.
paper_extract_text(paper_id="2401.12345")
paper_extract_text(pdf_path="/path/to/paper.pdf")
paper_extract_text(paper_id="2401.12345", overwrite=True)
Note: Requires PDF to be downloaded first. If text already extracted and overwrite=False, will reuse.
Save a paper to local literature knowledge base.
paper_save(paper={...paper_object...}, topic="NLP", tags=["transformer", "attention"])
Generate structured summary for a paper. Enhanced to use full text when available.
paper_summarize(paper_id="2401.12345")
paper_summarize(paper_id="2401.12345", save=True)
paper_summarize(paper_id="2401.12345", overwrite=True)
Summary Priority:
overwrite=True forces regeneration even if summary existsCompare multiple papers across key dimensions (problem, method, findings, limitations). Can find relevant papers automatically by topic.
# Compare specific papers by ID
paper_compare(paper_ids=["2401.12345", "2301.45678", "2212.34567"])
# Compare papers given as objects
paper_compare(papers=[{...paper_object...}, {...}])
# Find relevant papers from local storage by topic and compare
paper_compare(topic="retrieval augmented generation", max_papers=5)
Behavior:
paper_ids or papers provided → compares those directlytopic provided → searches local literature/ for relevant papersGenerate a structured literature review for a research topic using locally saved papers. Saves to literature/reviews/<topic_slug>.json and .md.
# Generate literature review for a topic
paper_review(topic="retrieval augmented generation", max_papers=10)
# Overwrite existing review
paper_review(topic="RAG", overwrite=True)
Output includes:
Note: Only uses papers saved in local literature/ storage. Will not fetch new papers.
paper_search or direct IDspaper_compare(topic="...") to auto-find and compare from local storagepaper_review(topic="...") to generate full literature reviewpaper_search and save with paper_savepaper_download_pdf and extract with paper_extract_textpaper_summarize (uses full text)paper_compare(paper_ids=[...]) for specific paperspaper_review(topic="...") for comprehensive literature reviewpaper_search to find relevant paperspaper_get for full metadatapaper_download_pdf to get local PDFpaper_extract_text to get full text contentpaper_summarize - now enhanced with full textpaper_save to persist everythingAll tools return structured text with:
paper_download_pdfpaper_extract_textpaper_summarize (uses full text)ResearchBot now supports multiple paper metadata sources:
When you save an arXiv paper or search for papers, ResearchBot can automatically enrich the metadata:
To enable automatic enrichment, configure your API keys (see Configuration section).
Enrich existing paper metadata with Crossref and OpenAlex data.
# Enrich by arXiv ID
paper_enrich(paper_id="2401.12345")
# Enrich by DOI
paper_enrich(doi="10.1000/xyz123")
# Enrich by title
paper_enrich(title="Attention is All You Need")
# Enrich and save back to literature storage
paper_enrich(paper_id="2401.12345", save=True)
What it enriches:
Search Crossref directly for published papers.
# Search by query
crossref_search(query="transformer architecture")
# Search with filters
crossref_search(query="machine learning", year=2023, author="Hinton")
# Search by DOI
crossref_search(query="", doi="10.1000/xyz123")
Crossref is best for:
Search OpenAlex for papers with rich metadata.
# Search by query
openalex_search(query="attention mechanism")
# Search with filters
openalex_search(query="deep learning", year=2023, author="Bengio")
# Search by title
openalex_search(query="language models", title="BERT")
OpenAlex is best for:
For existing arXiv papers in your literature storage:
paper_enrich(paper_id="2401.12345", save=True)literature/papers/2401.12345.json for enriched fieldsExport paper citations in standard academic formats for use in manuscripts, reference managers, and bibliographies.
# Export single paper as BibTeX
paper_cite(paper_id="2401.12345", format="bibtex")
# Export multiple papers as RIS (for Zotero/EndNote)
paper_cite(paper_ids=["2401.12345", "2301.45678"], format="ris")
# Export all local papers as CSL-JSON
paper_cite(format="csl-json")
# Export to file
paper_cite(paper_id="2401.12345", format="bibtex", output="file", path="refs.bib")
# Export a paper dict directly (without loading from local storage)
paper_cite(paper={...paper_object...}, format="apa")
Supported formats:
| Format | Description | Typical Use |
|---|---|---|
bibtex | BibTeX entries | LaTeX manuscripts |
ris | RIS tag format | EndNote, Zotero, Mendeley |
csl-json | CSL-JSON objects | Pandoc, Citeproc |
apa | APA 7th edition | Social science papers |
mla | MLA 9th edition | Humanities papers |
gbt7714 | GB/T 7714-2015 | Chinese academic papers |
Parameters:
paper_id: Export a single paper by IDpaper_ids: Export multiple papers by ID listpaper: Export a paper dict directlyformat: Output format (default: bibtex)output: text (default) or filepath: File path for file mode (defaults to literature/citations/export.<ext>)Citekey format: Deterministic keys like vaswani2017attention (first author last name + year + title words). DOI suffix appended for uniqueness.
Workflow: When user asks for citations, references, bibliography, or export:
paper_cite with the desired formatoutput="file"max_results=10 for broader discovery, max_results=3 for focused searchsubmittedDate for latest papers, relevance for most relatedpaper_save it with relevant topic/tagsliterature/papers/<paper_id>.json - metadataliterature/papers/<paper_id>.md - readable markdownliterature/pdfs/<paper_id>.pdf - downloaded PDFliterature/extracted/<paper_id>.txt - extracted textliterature/reviews/<topic_slug>.json - literature review (structured)literature/reviews/<topic_slug>.md - literature review (markdown)literature/indexes/search.sqlite3 - semantic search index (SQLite)ResearchBot supports local semantic search over your saved papers. This enables:
Search local papers using semantic search.
# Basic semantic search
paper_search_local(query="machine learning security")
# With filters
paper_search_local(
query="deep learning for graphs",
topic="graph neural networks",
tags=["GNN", "attention"],
year_from=2022,
year_to=2024,
top_k=10,
rerank=True
)
# Search by source
paper_search_local(query="transformer", source="arxiv")
Parameters:
query (required): Search query stringtop_k: Maximum results to return (default: 10)topic: Filter by topic (substring match)tags: Filter by tags (any match)year: Filter by exact yearyear_from / year_to: Year range filtercategories: Filter by categoriessource: Filter by source (arxiv, crossref, openalex)rerank: Apply LLM reranking (default: True)Manage the local search index.
# Rebuild entire index from all local papers
paper_index(rebuild=True)
# Index a specific paper
paper_index(paper_id="2401.12345")
The search index is automatically updated when you:
paper_save)paper_summarize)paper_enrich)You normally don't need to manually rebuild the index unless:
The system distinguishes between "disabled" and "unavailable":
embedding 未配置(embeddingApiKey 为空):
sqlite-vec 配置关闭(enableSqliteVec=false):
sqlite-vec 环境不可用(但 enableSqliteVec=true):
embedding 服务运行时失败:
The semantic search is configured via the literature.semanticSearch section:
{
"literature": {
"semanticSearch": {
"sqliteDbPath": "literature/indexes/search.sqlite3",
"embeddingModel": "text-embedding-v4",
"embeddingProvider": "dashscope",
"embeddingApiKey": "your-api-key",
"embeddingApiBase": "",
"enableSqliteVec": true,
"enableRerank": true,
"rerankTopK": 20,
"hybridSearchRrfK": 60,
"lexicalWeight": 0.3,
"vectorWeight": 0.7
}
}
}
配置说明:
embeddingApiKey 不配置则不启用向量检索,FTS5 检索仍然正常工作enableSqliteVec 设为 false 则完全不加载 sqlite-vec 扩展rerankTopK: LLM rerank 的候选数量hybridSearchRrfK: RRF 融合的 k 值阿里云 dashscope 配置示例:
export RESEARCHBOT_LITERATURE__SEMANTIC_SEARCH__EMBEDDING_PROVIDER="dashscope"
export RESEARCHBOT_LITERATURE__SEMANTIC_SEARCH__EMBEDDING_API_KEY="your-dashscope-api-key"
export RESEARCHBOT_LITERATURE__SEMANTIC_SEARCH__EMBEDDING_MODEL="text-embedding-v4"
最小配置(仅 FTS5):
{
"literature": {
"semanticSearch": {
"sqliteDbPath": "literature/indexes/search.sqlite3"
}
}
}
workspace/literature/indexes/search.sqlite3papers, papers_fts, paper_embeddings, paper_vectors (如果 sqlite-vec 可用)