Use this skill when the user wants a systematic literature review, survey, or synthesis across multiple academic papers on a topic. Also covers annotated bibliographies and cross-paper comparisons. Searches arXiv and outputs reports in APA, IEEE, or BibTeX format. Not for single-paper tasks — use academic-paper-review for reviewing one paper.
This skill produces a structured systematic literature review (SLR) across multiple academic papers on a research topic. Given a topic query, it searches arXiv, extracts structured metadata (research question, methodology, key findings, limitations) from each paper in parallel, synthesizes themes across the full set, and emits a final report with consistent citations.
Distinct from academic-paper-review: that skill does deep peer review of a single paper. This skill does breadth-first synthesis across many papers. If the user hands you one paper URL and asks "review this paper", route to academic-paper-review instead.
Use this skill when the user wants any of the following:
Do not use this skill when:
academic-paper-review)The workflow has five phases. Follow them in order.
Before doing any retrieval, confirm the following with the user. If any of these are unclear, ask one clarifying question that covers the missing pieces. Do not ask one question at a time.
cs.CL, cs.CV)./mnt/user-data/outputs/).If the user says "50+ papers", politely cap it at 50 and explain that synthesis quality degrades quickly past that — for larger surveys they should split by sub-topic.
Call the bundled search script. Do not try to scrape arXiv by other means and do not write your own HTTP client — this script handles URL encoding, Atom XML parsing, and id normalization correctly.
python /mnt/skills/public/systematic-literature-review/scripts/arxiv_search.py \
"<topic>" \
--max-results <N> \
[--category <cat>] \
[--sort-by relevance] \
[--start-date YYYY-MM-DD] \
[--end-date YYYY-MM-DD]
IMPORTANT — extract 2-3 core keywords before searching. Do not pass the user's full topic description as the query. Before calling the script, mentally reduce the topic to its 2-3 most essential terms. Drop qualifiers like "in computer vision", "for NLP", "variants", "recent" — those belong in --category or --start-date, not in the query string.
Query phrasing — keep it short. The script wraps multi-word queries in double quotes for phrase matching on arXiv. This means:
"diffusion models" → searches for the exact phrase → good, returns relevant papers."diffusion models in computer vision" → searches for that exact 5-word phrase → too specific, likely returns 0 results because few papers contain that exact string.Use 2-3 core keywords as the query, and use --category to narrow the field instead of stuffing field names into the query. Examples:
| User says | Good query | Bad query |
|---|---|---|
| "diffusion models in computer vision" | "diffusion models" --category cs.CV | "diffusion models in computer vision" |
| "transformer attention variants" | "transformer attention" | "transformer attention variants in NLP" |
| "graph neural networks for molecules" | "graph neural networks" --category cs.LG | "graph neural networks for molecular property prediction" |
The script prints a JSON array to stdout. Each paper has: id, title, authors, abstract, published, updated, categories, pdf_url, abs_url.
Sort strategy:
relevance sorting — arXiv's BM25-style scoring ensures results are actually about the user's topic. submittedDate sorting returns the most recently submitted papers in the category regardless of topic relevance, which produces mostly off-topic results.--sort-by relevance combined with --start-date to constrain the time range while keeping results on-topic. For example, "recent diffusion model papers" → --sort-by relevance --start-date 2024-01-01, not --sort-by submittedDate.submittedDate sorting is only appropriate when the user explicitly asks for chronological order (e.g. "show me papers in the order they were published"). This is rare.lastUpdatedDate is rarely useful; ignore it unless the user asks.Run the search exactly once. Do not retry with modified queries if the results seem imperfect — arXiv's relevance ranking is what it is. Retrying with different query phrasings wastes tool calls and risks hitting the recursion limit. If the results are genuinely empty (0 papers), tell the user and suggest they broaden their topic or remove the category filter.
If the script returns fewer papers than requested, that is the real size of the arXiv result set for the query. Do not pad the list — report the actual count to the user and proceed.
If the script fails (network error, non-200 from arXiv), tell the user which error and stop. Do not try to fabricate paper metadata.
Do not save the search results to a file — the JSON stays in your context for Phase 3. The only file saved during the entire workflow is the final report in Phase 5.
You MUST delegate extraction to subagents via the task tool — do not extract metadata yourself. This is non-negotiable. Specifically, do NOT do any of the following:
python -c "papers = [...]" or any Python/bash script to process paperstask for this phaseInstead, you MUST call the task tool to spawn subagents. The reason: extracting 10-50 papers in your own context consumes too many tokens and degrades synthesis quality in Phase 4. Each subagent runs in an isolated context with only its batch of papers, producing cleaner extractions.
Split papers into batches of ~5, then for each batch, call the task tool with subagent_type: "general-purpose". Each subagent receives the paper abstracts as text and returns structured JSON.
Concurrency limit: at most 3 subagents per turn. The DeerFlow runtime enforces MAX_CONCURRENT_SUBAGENTS = 3 and will silently drop any extra dispatches in the same turn — the LLM will not be told this happened, so strictly follow the round strategy below.
Round strategy — use this decision table, do not compute the split yourself:
| Paper count | Batches of ~5 papers | Rounds | Per-round subagent count |
|---|---|---|---|
| 1–5 | 1 batch | 1 round | 1 subagent |
| 6–10 | 2 batches | 1 round | 2 subagents |
| 11–15 | 3 batches | 1 round | 3 subagents |
| 16–20 | 4 batches | 2 rounds | 3 + 1 |
| 21–25 | 5 batches | 2 rounds | 3 + 2 |
| 26–30 | 6 batches | 2 rounds | 3 + 3 |
| 31–35 | 7 batches | 3 rounds | 3 + 3 + 1 |
| 36–40 | 8 batches | 3 rounds | 3 + 3 + 2 |
| 41–45 | 9 batches | 3 rounds | 3 + 3 + 3 |
| 46–50 | 10 batches | 4 rounds | 3 + 3 + 3 + 1 |
Never dispatch more than 3 subagents in the same turn. When a row says "2 rounds (3 + 1)", that means: first turn dispatches 3 subagents in parallel, wait for all 3 to complete, then second turn dispatches 1 subagent. Rounds are strictly sequential at the main-agent level.
If the paper count lands between rows (e.g. 23 papers), round up to the next row's layout but only dispatch as many batches as you actually need — the decision table gives you the shape, not a rigid prescription.
Do the batching at the main-agent level: you already have every paper's abstract from Phase 2, so each subagent receives pure text input. Subagents should not need to access the network or the sandbox — their only job is to read text and return JSON. Do not ask subagents to re-run arxiv_search.py; that would waste tokens and risk rate-limiting.
What each subagent receives, as a structured prompt:
Execute this task: extract structured metadata and key findings from the
following arXiv papers.
Papers:
[Paper 1]
arxiv_id: 1706.03762