Shared reference for all paper search skills. Defines the unified 3-source paper discovery strategy: local vec-db (96K top-venue papers), Semantic Scholar API (200M+ papers + citation graph), and AlphaXiv (free full-text reading). This skill is NOT user-invokable — it is a reference loaded by other skills (research-survey, gap-to-method, paper_related_works, idea_refinery, topic_survey).
This is a shared reference document. Other paper-related skills should follow these strategies.
What: LanceDB vector database with ~96K papers from CoRL, ICRA, IROS, RSS, NeurIPS, ICML, ICLR, CVPR, etc. Best for: Finding highly relevant top-venue papers via semantic similarity. Limitation: Only indexed conferences; no arXiv-only papers; no citation info.
cd /home/vla-reasoning/proj/litian-research/vec-db
npx tsx src/cli.ts search "<query>" --top 15
Tips:
What: Free academic search API with citation graph, covering all major publishers. Best for: Keyword search across ALL papers; finding citing/cited papers; getting citation counts. Limitation: No full text; rate limited (5000 req/5min unauthenticated).
Search by keyword:
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=<URL_ENCODED_KEYWORDS>&limit=20&fields=title,year,authors,citationCount,externalIds,abstract&sort=citationCount:desc"
Search recent papers:
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=<KEYWORDS>&limit=20&fields=title,year,authors,citationCount,externalIds,abstract&year=2024-2026"
Get citations of a paper (successors):
curl -s "https://api.semanticscholar.org/graph/v1/paper/ArXiv:<ID>?fields=title,year,citationCount,citations.title,citations.year,citations.authors,citations.externalIds,citations.citationCount"
Get references of a paper (predecessors):
curl -s "https://api.semanticscholar.org/graph/v1/paper/ArXiv:<ID>?fields=title,year,references.title,references.year,references.externalIds,references.citationCount"
What: Free structured Markdown rendering of arXiv papers. Best for: Reading paper content without downloading PDF. Much faster than PDF parsing. Limitation: Only arXiv papers; some papers may 404.
Structured overview (try first, faster):
WebFetch: https://alphaxiv.org/overview/<ARXIV_ID>.md
Full text (if overview lacks detail):
WebFetch: https://alphaxiv.org/abs/<ARXIV_ID>.md
Fallback: If AlphaXiv returns 404, download and read the PDF directly:
wget -q "https://arxiv.org/pdf/<ARXIV_ID>" -O "<ARXIV_ID>.pdf"
1. Vec-db semantic search (5-8 queries, --top 15 each) → top-venue papers
2. Semantic Scholar keyword search (2-3 queries) → broader coverage + recent
3. Web search for arXiv (补充最新未索引论文) → cutting-edge
4. Deduplicate by title similarity
5. Read top papers via AlphaXiv
1. Vec-db semantic search (per design dimension) → map the design space
2. Semantic Scholar keyword search (per dimension) → fill matrix gaps
3. Web search for very recent work → ensure no one beat you
4. Build literature matrix from combined results
5. Read gap-adjacent papers via AlphaXiv for evidence
1. Read the paper via AlphaXiv → understand it first
2. Semantic Scholar citations API → predecessors + successors
3. Vec-db search with paper's key concepts → find related top-venue work
4. Web search for concurrent/recent follow-ups → very latest
1. Vec-db search (idea's key concepts) → closest existing work
2. Semantic Scholar search + citation snowball → validate novelty
3. AlphaXiv to read key competitors → understand deeply
4. Web search for latest arXiv preprints → check no overlap