Conduct a deep literature survey across top conferences and arXiv preprints. Use when the user asks to 调研/survey/review a research direction, compile papers on a topic, analyze trends in a field, or find research gaps and ideas. Produces categorized paper tables, trend analysis, and actionable research ideas based on full-PDF deep reading of dozens to hundreds of papers.
End-to-end pipeline for producing a deep, citation-grounded survey on a research topic. Every paper is downloaded as PDF, converted to full text, and read by a sub-agent to extract structured fields — not just abstracts.
This is NOT a lightweight "list some papers" task. It produces:
Typical phrases: "调研一下…", "帮我看看…领域在做什么", "给我想几个 idea about …", "最近 X 方向有什么新工作"
# 1. Clone papers-cool-downloader (for conference search via papers.cool)
git clone https://github.com/QWERTY0205/papers-cool-downloader /tmp/papers-cool-downloader
pip install requests
# 2. Install poppler-utils (for PDF text extraction via pdftotext)
apt-get install -y poppler-utils
# 3. Grant permissions in the workspace directory
# See the `scripts/setup_permissions.sh` helper
Follow these phases in order. Each phase has a dedicated helper script under scripts/.
Key design principle: The synthesis phase is split from paper analysis via structured JSON artifacts (lineages.json → findings.json → ideas.json → markdown). This prevents the context pollution that plagues "read all papers and freeform write ideas" workflows.
/data/paper/<topic_slug>/. Create subdirs:
/data/paper/<topic_slug>/
├── scripts/
├── pdfs/ (arxiv PDFs)
├── pdfs_conf/ (conference PDFs)
├── texts/ (pdftotext output for arxiv)
├── texts_conf/ (pdftotext output for conferences)
├── batches/ (sub-agent input batches)
├── results/ (sub-agent output per batch)
└── analysis/
Use papers-cool-downloader to search each venue in parallel. See scripts/search_conferences.py.
# Runs papers-cool-downloader for each venue in parallel.
# Output: /data/paper/<topic>/<venue_tag>_raw.json per venue.
python3 scripts/search_conferences.py \
--workspace /data/paper/<topic>/ \
--keywords "streaming video" "video LLM" "online video" \
--venues CVPR.2025 ICCV.2025 NeurIPS.2025 ICLR.2026 AAAI.2026 ACL.2025 ICML.2025
Use WebFetch on arxiv.org/search/ with multiple keyword queries (at least 3–5 different phrasings). Important: arXiv search only shows ~50 papers per query — use multiple phrasings to maximize coverage.
For each query, extract papers with submission date in the target range (usually last 6 months). Save to arxiv_candidates.json.
Typical queries: