Use this skill whenever the user wants to deeply read, summarize, compare, or synthesize academic papers. Trigger on phrases like: "synthesize papers", "compare papers", "read and summarize [paper list]", "generate comparison table", "write summary of related work papers", "extract key results from papers", "deep read these papers", "summarize what these papers do", "make a table comparing approaches", "write the related work section based on these papers", "what do these papers say about [topic]", "contrast our approach with [paper]", "help me understand this paper in context", "what are the key takeaways from the to-read list", "summarize my reading list". Run this skill after paper-search-and-triage has populated the "to-read" list, or when the user directly provides paper IDs, PDF links, or titles to analyze.
This skill performs deep reading and synthesis of academic papers, producing:
.md file per paper containing the extraction cardmanifest.md at the topic root — the lightweight index with narrative synthesis and
cross-paper analysis (the only file an AI needs to load for orientation)_table.tex containing the ACL-format LaTeX comparison tableThis structure keeps individual paper files small and AI-friendly (each fits in one context window), while the manifest provides the high-level view.
literature/synthesis/{TOPIC}/
manifest.md ← index, cluster map, narrative synthesis, open problems
{TOPIC}_table.tex ← ACL LaTeX comparison table
{cluster_slug}/
{paper_slug}.md ← one extraction card per paper
{paper_slug}.md
{cluster_slug}/
{paper_slug}.md
...
Naming conventions:
{TOPIC} — short label for the synthesis batch (e.g., llm_vuln_repair, apr_baselines).
Derived in Step 1. Becomes the directory name under literature/synthesis/.{cluster_slug} — lowercased, underscored cluster name derived from the papers themselves
(e.g., llm_based_repair, classical_apr, benchmarks). Derived in Step 4a.{paper_slug} — {firstauthor}_{year} form, lowercased (e.g., xia_2023,
sobania_2023). If two papers share the same slug, append a letter: xia_2023a, xia_2023b.Updated tracker: literature/papers.csv (status → synthesized)
Two modes:
Mode A — From tracker (default):
Read literature/papers.csv. Filter rows where status = 'to-read'. Present the list to the user:
"Found N papers with status 'to-read'. Synthesizing all of them, or a subset? (Reply with 'all' or list arXiv IDs / row numbers to include.)"
Mode B — User-provided:
User provides arXiv IDs, PDF URLs, or paper titles directly. Look up each in papers.csv first.
For papers not in the tracker, add them with status to-read before proceeding.
For papers with no arxiv_id (status s2:... prefix), use the url field to access the paper.
Record the final paper list. Assign a short TOPIC label (e.g., llm_vuln_repair,
apr_baselines, fuzzing_ml) based on the cluster of papers selected. This label becomes the
directory name for all synthesis outputs under literature/synthesis/{TOPIC}/.
For each paper in the synthesis list:
Check if a PDF URL is available in papers.csv (url field). If it points to a PDF
(ends in .pdf or contains arxiv.org/pdf), use WebFetch to retrieve it.
If no direct PDF URL: try constructing the arXiv PDF URL:
https://arxiv.org/pdf/{arxiv_id}
If the paper is paywalled with no open-access version:
https://www.semanticscholar.org/paper/{paperId}[ABSTRACT ONLY - no PDF access].After retrieving the content, extract the text. For PDFs fetched via WebFetch, the content will be HTML or text — parse out the paper body text.
See skills/deep-paper-synthesis/references/synthesis-template.md for the exact paper card format.
For each paper, draft a paper extraction card in memory. Do not write files yet — cluster assignments (Step 4a) determine the directory paths. Hold all cards until Step 4a is complete.
The card contains these sections:
One concise paragraph (2-4 sentences):
2-5 bullet points or a short paragraph covering:
For LLM-based papers: note which models are used (GPT-4, Claude, open-source), prompting style (zero-shot, few-shot, chain-of-thought), and any fine-tuning.
For domain-specific papers (e.g., APR): note the repair operator set, search strategy, oracle type, or other domain-specific design choices relevant to the paper.
Always capture:
Format as a small table if multiple metrics are reported:
| Metric | This Paper | Best Baseline |
|---|---|---|
| Correct patch rate | 43.2% | 28.1% (GPT-4-base) |
| Plausibility rate | 71.4% | 65.0% |
2-4 bullet points identifying:
1-3 sentences written from the perspective of the paper being written. Answer:
Note the gap phrase; it will be written to gap_notes in papers.csv in Step 9.
Do this before writing any files. Scan all drafted cards and assign each paper to a cluster. This determines the directory structure.
Identify 3-5 thematic clusters across the papers. Derive cluster names from the papers themselves — do not assume a fixed set. Common cluster types that apply across domains:
For the specific domain, derive more precise cluster names from the paper abstracts.
Assign each paper to its primary cluster (cluster field). A paper may appear in a
secondary cluster too (record in secondary_clusters front matter field).
After assigning clusters, derive {cluster_slug} and {paper_slug} for every paper.
Trace the chronological progression of key ideas in the domain:
Write a 3-4 sentence paragraph tracing this arc. This goes in manifest.md.
From the limitations sections of all cards, enumerate open problems that multiple papers share:
This goes in manifest.md.
Build a LaTeX comparison table. Derive the comparison axes from the papers themselves
and from project/research-focus.md (if it exists). Standard axes that work across domains:
Always include:
Domain-specific axes — select from the following based on what differentiates papers in the set (typically 4-6 axes total):
Use the LaTeX table template from references/latex-table-patterns.md.
Hold the table in memory for now; it will be written to disk in Step 8.
Suggested column grouping:
Write 3-5 paragraphs of narrative synthesis. This content goes in manifest.md.
Paragraph 1 — Chronological arc:
Begin with the earliest relevant work and trace the evolution to the most recent. Cite papers
inline using \citet{} or \citep{} ACL style. Cover how the field progressed from early
approaches to more recent ones. Adapt this arc to the domain; do not assume a fixed narrative.
Example opening structure:
[Field] has been studied since \citet{...}, who first demonstrated ... Over time, the community shifted toward [later paradigm]. Most recently, [dominant current approach] has emerged as the leading direction (\citep{...}).
Paragraph 2 — Thematic synthesis by cluster: Group papers by the clusters identified in Step 4a. Describe each cluster in 2-3 sentences, citing the key papers. Highlight the dominant approach within each cluster and its assumptions.
Paragraph 3 — Most relevant sub-field (zoom in): Zoom into the cluster most directly related to the paper being written. Contrast its assumptions, oracles, and evaluation setups with those of the other clusters. Identify the specific gap or limitation shared by most papers in this cluster that the paper addresses.
Paragraph 4 — Gap and positioning (required): Explicitly state what no existing work addresses. Example structure:
Despite progress in [field], no existing work [gap 1]. Furthermore, [gap 2]. Our work addresses both gaps by [method contribution].
This paragraph feeds directly into the gap_map.md produced by research-gap-mapper.
Paragraph 5 (optional) — Evaluation methodology comparison: If the papers use very different evaluation setups, add a paragraph comparing benchmark characteristics (dataset size, bug type distribution, oracle strength).
Write in academic English, past tense for describing prior work, present tense for our claims. Target 400-600 words for the full narrative. Avoid excessive hedging.
Save literature/synthesis/{TOPIC}/manifest.md with this structure:
# Synthesis Manifest: {TOPIC}
**Date**: {DATE} | **Papers**: {N} | **Clusters**: {K}
**Skill version**: deep-paper-synthesis 2.0.0
## Papers in This Synthesis
### {Cluster Display Name} (`{cluster_slug}/`)
- [{Title}](./{cluster_slug}/{paper_slug}.md) — {Authors}, {Venue} {Year} | Relevance: {score}/5
- ...
### {Cluster Display Name} (`{cluster_slug}/`)
- ...
## Comparison Table
See [{TOPIC}_table.tex](./{TOPIC}_table.tex)
## Cross-Paper Analysis
### Idea Evolution Timeline
{3-4 sentence paragraph from Step 4b}
### Open Problems
{bullet list or table from Step 4c}
### Conflicting Claims
{list contradictions; omit section if none}
## Narrative Synthesis
{3-5 paragraphs from Step 6}
---
## Examined but Excluded
{papers rejected after reading; omit section if none}
- {Title} — {reason}
The manifest is the entry point for any AI working with this literature set. Individual paper files are loaded on demand when detail about a specific paper is needed.
After the manifest is written, write all paper .md files and the LaTeX table. Write files in
parallel where possible.
Per-paper file path: literature/synthesis/{TOPIC}/{cluster_slug}/{paper_slug}.md
Per-paper file format:
---