Use when user asks to read, summarize, or analyze a research paper (PDF or text). Triggers on keywords like "read paper", "summarize paper", "paper summary", "literature review", "analyze this paper"
A structured approach to reading and summarizing scientific research papers. Automatically identifies paper type (empirical/theoretical/survey/systems), selects the appropriate template, screenshots important figures, and embeds them in the summary document.
Not for: Tutorial papers, textbooks, or non-research documents
digraph paper_reading {
rankdir=TB;
"Receive paper" -> "Get PDF file";
"Get PDF file" -> "Read PDF content";
"Read PDF content" -> "Identify paper type";
"Identify paper type" -> "Prepare output directory";
"Prepare output directory" -> "Extract figures (pymupdf4llm)";
"Extract figures (pymupdf4llm)" -> "Filter & rename images";
"Filter & rename images" -> "Fill type-specific template";
"Fill type-specific template" -> "Write markdown file";
}
All papers are processed as PDF. No HTML/ar5iv path.
| Source | Detection | Action |
|---|---|---|
| Local PDF | File path ends with .pdf | Use directly |
| arXiv URL | Contains arxiv.org | Extract paper ID → download https://arxiv.org/pdf/XXXX.XXXXX |
| Other URL | Default | Try downloading as PDF; if not a PDF, use WebFetch for text |
# For arXiv: extract ID and download PDF
curl -L -o <output_dir>/paper.pdf "https://arxiv.org/pdf/XXXX.XXXXX"
# For other URLs: try direct download
curl -L -o <output_dir>/paper.pdf "<url>"
# Verify it's a valid PDF: file <output_dir>/paper.pdf should show "PDF document"
Use the Read tool to read the PDF file. Claude natively supports reading PDF files and extracting text content. For large PDFs (>10 pages), read in page ranges (e.g., pages: "1-10", then pages: "11-20").
After reading the title, abstract, and introduction, determine paper type:
| Type | Identification Signals |
|---|---|
| Empirical | Proposes new method/model, has experimental comparisons, includes baselines |
| Theoretical | Theorem/proof-driven, math-heavy derivations, few or no experiments |
| Survey | Many citations (>100), taxonomy/classification, "survey"/"review" keywords |
| Systems | System design, engineering implementation, benchmarks, deployment experience |
When uncertain, default to the Empirical template.
mkdir -p <output_dir>/images
| Priority | Figure Type | When to Capture |
|---|---|---|
| Must | System architecture / overall framework | If available |
| Must | Main experiment results table/chart | If available |
| Recommended | Core algorithm flowchart | If available |
| Recommended | Ablation study charts | If available |
| Optional | Visualization / qualitative results | If space allows |
| Optional | Auxiliary illustrations | As needed |
Capture 3-8 key figures per paper. If the paper has few or no meaningful figures (e.g., theoretical papers, short workshop papers), skip figure extraction and produce a text-only summary.
Use pymupdf4llm to extract all images and vector graphics in one call. This handles raster images (photos, embedded figures) AND vector graphics (plots, diagrams, flowcharts) automatically.
Prerequisites: pymupdf and pymupdf4llm must be installed in the conda base environment.
# Install if needed (one-time)
source $HOME/anaconda3/etc/profile.d/conda.sh && conda activate base
pip install pymupdf4llm
Run extraction:
source $HOME/anaconda3/etc/profile.d/conda.sh && conda activate base && python3 << 'PYEOF'
import pymupdf4llm
import os
pdf_path = "<pdf_path>"
image_dir = "<output_dir>/images"
os.makedirs(image_dir, exist_ok=True)
# Extract all figures, tables, and vector graphics as images
md_result = pymupdf4llm.to_markdown(
pdf_path,
write_images=True,
image_path=image_dir,
image_format="png",
dpi=200,
use_ocr=False, # Disable OCR (avoids tesseract dependency)
)
# List extracted images
for f in sorted(os.listdir(image_dir)):
if f.endswith(('.png', '.jpg', '.jpeg')):
from PIL import Image
try:
img = Image.open(os.path.join(image_dir, f))
print(f"{f}: {img.size[0]}x{img.size[1]}")
except:
print(f"{f}: (size unknown)")
PYEOF
pymupdf4llm extracts ALL graphical regions, including logos, watermarks, and decorative elements. Filter and rename:
Step 4a: Filter out noise — Remove images that are:
source $HOME/anaconda3/etc/profile.d/conda.sh && conda activate base && python3 << 'PYEOF'
import os
from PIL import Image
image_dir = "<output_dir>/images"
removed = []
for f in os.listdir(image_dir):
path = os.path.join(image_dir, f)
try:
img = Image.open(path)
w, h = img.size
ratio = max(w, h) / max(min(w, h), 1)
if w < 100 and h < 100:
os.remove(path)
removed.append(f"{f} (too small: {w}x{h})")
elif ratio > 10:
os.remove(path)
removed.append(f"{f} (too narrow: {w}x{h})")
except:
pass
print(f"Removed {len(removed)} noise images:")
for r in removed:
print(f" - {r}")
kept = [f for f in sorted(os.listdir(image_dir)) if f.endswith(('.png', '.jpg'))]
print(f"\nKept {len(kept)} images:")
for f in kept:
img = Image.open(os.path.join(image_dir, f))
print(f" {f}: {img.size[0]}x{img.size[1]}")
PYEOF
Step 4b: Visual review & rename — Use the Read tool to view each remaining image. Based on content:
figure_1_overview.png, table_2_main_results.png, etc.If pymupdf4llm misses a specific figure or table (rare), use direct pymupdf clip rendering:
source $HOME/anaconda3/etc/profile.d/conda.sh && conda activate base && python3 << 'PYEOF'
import fitz
doc = fitz.open("<pdf_path>")
page = doc[PAGE_NUM] # 0-indexed
# Render a specific region at high resolution
clip = fitz.Rect(x0, y0, x1, y1) # coordinates from page layout
mat = fitz.Matrix(3, 3) # 3x zoom
pix = page.get_pixmap(matrix=mat, clip=clip)
pix.save("<output_dir>/images/figure_N_desc.png")
doc.close()
PYEOF
To find coordinates: use page.get_text("dict") to find text blocks containing "Figure N" or "Table N", then estimate the figure region nearby.
Format: figure_N_<brief_desc>.png / table_N_<brief_desc>.png
Examples:
figure_1_overview.png — system overviewfigure_2_architecture.png — model architecturetable_3_comparison.png — main results tablefigure_4_ablation.png — ablation studyDepth-first: For every section, ask "why" and "how", not just "what".
| Shallow writing (prohibited) | Deep writing (required) |
|---|---|
| "Proposes a new method" | "Addresses bottleneck Y in problem X via mechanism Z" |
| "Achieves SOTA results" | "Improves X% over method B on dataset A, primarily because of Y" |
| "Uses a Transformer" | "Uses L-layer Transformer with input dim D, H attention heads, key modification is Z" |
| "Has some limitations" | "Only validated in scenario X, does not account for distribution shift Y, assumption Z may not hold in practice" |
All types share these sections:
## Basic Information
- **Title:**
- **Authors:**
- **Affiliation:** (optional)
- **Published:**
- **Link:**
- **Paper Type:** [Empirical / Theoretical / Survey / Systems]
- **One-line summary:** What was done + how + what was the result
## Research Problem
- **What problem does it solve?** Identify the specific gap in existing methods
- **Key assumptions:** What constraints/limitations frame the research
- **Why is it important?** The practical impact on the field
- **Positioning among related work:** What are the 2-3 closest prior works? What is the key difference?
## Basic Information
[shared section]
## Research Problem
[shared section]
- **Mathematical formulation:** (optional)
<!-- Insert problem definition/motivation figure here if available -->
## Key Insight
> Distill the paper's core new idea in 2-3 sentences. Not "what was done", but "what insight makes this method work".
> Example: Rather than predicting frame-by-frame, first establish long-term 3D point tracking, then leverage temporal consistency for joint optimization.
## Technical Method
### Overall Framework and Principles
<!-- Insert architecture diagram -->
<!-- Insert figure here:  -->
- Overall system architecture description
- Modules/components and their responsibilities
- Signal/data flow direction
- **Why this design?** Advantages over the intuitive/naive approach
### Core Component Details
<!-- Insert algorithm flowchart here if available -->
- Model/algorithm architecture details (layers, dimensions, input/output)
- Training objective and loss function (write key equations)
- Training data source (synthetic/real/mixed, dataset names and scale)
- Key tricks and design decisions
- **Motivation for each design choice:** Why use A instead of B? Does the paper provide justification?
## Experimental Results
<!-- Insert experiment result figures/tables -->
<!-- Insert figure here:  -->
### Results (Facts)
- **Experimental setup:** Environment, hardware, hyperparameters
- **Baselines compared:** List specific method names and sources
- **Key results:** Quantitative improvement margins (specific numbers + percentages)
- **Ablation study:** Component contributions (removing X decreases performance by Y%)
- **Surprising findings:** Any counterintuitive results
### Analysis (Interpretation)
- Authors' explanation and attribution of results
- Which scenarios/datasets show best performance? Worst?
- Root cause of performance gains (authors' claims vs actual evidence)
<!-- Insert ablation or visualization result figures here if available -->
## Critical Analysis
### Strengths
- Specific improvements over prior work (not just "good results")
### Limitations
- **Acknowledged by authors:**
- **My observations:** Issues not mentioned in the paper
- Do assumptions hold in practice?
- Are compute/data requirements reasonable?
- Are evaluation metrics comprehensive?
### Reproducibility Assessment
- Is code open-sourced? Is data available?
- Are key implementation details sufficiently described?
## Summary
[shared section]
## Basic Information
[shared section]
## Research Problem
[shared section]
- **Mathematical formulation:** Formal problem definition
## Key Insight
> Distill the paper's core theoretical contribution in 2-3 sentences. What new mathematical tool/perspective makes this result possible?
## Theoretical Framework
### Problem Formalization
- Symbol definitions and notation conventions
- Core mathematical definitions
### Main Theorems and Proof Sketches
- **Theorem 1:** Statement + key proof idea (not full proof, but key steps and key lemmas)
- **Theorem 2:** ...
- Key techniques used in proofs: Why does this technique work? Is there a more intuitive explanation?
### Theoretical Analysis
- Implications and intuitive interpretation of results (restate in non-mathematical language)
- Tightness of upper/lower bounds
- Relationship to and comparison with known results: Which bound was improved? Which assumption was relaxed?
## Validation (if experiments exist)
- Experimental setup
- Comparison of theoretical predictions vs actual results
- How is the gap between theory and experiments explained?
## Critical Analysis
### Strengths
- Importance and novelty of the theoretical contribution
### Limitations
- **Acknowledged by authors:**
- **My observations:** Reasonableness of assumptions, practical utility, difficulty of generalization
## Summary
[shared section]
## Basic Information
[shared section]
- **Coverage:** Number of papers surveyed, time span
## Research Problem
[shared section]
## Key Insight
> What is the core contribution of this survey? What classification perspective was proposed, or what important trends were identified?
## Taxonomy
<!-- Insert taxonomy/classification figure -->
<!-- Insert figure here:  -->
- Main classification dimensions and rationale for their selection
- Category definitions and representative works
### Direction 1: [Name]
- Key methods and advances
- Representative works (author, year)
- Pros and cons
- **Current bottleneck:** The core challenge facing this direction
### Direction 2: [Name]
- ...
### Method Comparison
| Method Type | Strengths | Weaknesses | Representative Works | Best Use Case |
|-------------|-----------|------------|---------------------|---------------|
| ... | ... | ... | ... | ... |
## Open Problems and Trends
- Current major challenges in the field
- Emerging trends and directions
- Authors' predictions and recommendations
- **Most promising direction:** Based on the survey analysis, which direction deserves most attention? Why?
## Critical Analysis
- Is the survey's coverage comprehensive? Any important directions missed?
- Is the taxonomy reasonable? Could it be organized better?
- Do the authors' opinions/biases affect the survey's objectivity?
## Summary
[shared section]
## Basic Information
[shared section]
## Research Problem
[shared section]
- **Design goals:** Key requirements the system must meet
## Key Insight
> What is the core design insight of this system? What trade-off or observation makes this design superior to existing solutions?
## System Design
### Architecture Overview
<!-- Insert system architecture diagram -->
<!-- Insert figure here:  -->
- Overall architecture and component breakdown
- Component responsibilities and interfaces
### Key Design Decisions
- Decision 1: What choice was made, why (and not the alternative)
- Decision 2: Trade-offs considered (performance vs complexity vs maintainability)
- Key differences from existing systems
### Implementation Details
- Key tech stack/dependencies
- Optimization techniques
- Fault tolerance / scalability design
## Performance Evaluation
<!-- Insert performance comparison figures/tables -->
<!-- Insert figure here:  -->
### Experimental Facts
- **Benchmark setup:** Environment, hardware, workloads
- **Compared systems:**
- **Key metrics:** Throughput, latency, resource usage (specific numbers)
- **Scalability:** Performance as scale increases
### Result Interpretation
- Under what conditions does it perform best? When does it degrade?
- Root cause of performance advantages
## Deployment Experience (if available)
- Real-world production performance
- Problems encountered and solutions
## Critical Analysis
### Strengths
- Design elegance and practicality
### Limitations
- **Acknowledged by authors:**
- **My observations:** Deployment assumptions, hardware dependencies, generalizability
## Summary
[shared section]
## Summary and Evaluation
### Three-Perspective Conclusion (Andrew Ng Framework)
**Authors' conclusion:** What do the authors claim to have accomplished? What goals were achieved?
**Personal assessment:**
- Did the authors truly achieve their claimed goals? Is the evidence sufficient?
- How much does this work actually advance the field?
**Overall evaluation:**
- **Core idea:** One sentence summarizing the core contribution
- **Main highlight:** What stands out compared to prior work
- **Future directions:** Natural next steps
- **Rating:** [Breakthrough / Important / Valuable / Incremental]
### Comprehension Verification (Self-check after writing)
1. What were the authors trying to accomplish?
2. What are the key elements of the approach?
3. What can I use in my own research?
4. What references are worth reading further?
| Mistake | Correction |
|---|---|
| Copying abstract verbatim | Synthesize in your own words, distill the key insight |
| Missing key assumptions | Explicitly state what the method assumes |
| Vague architecture description | Include specific dimensions and layer types |
| Ignoring failure cases | Note where method underperforms and on which datasets |
| Skipping mathematical notation | Include key LaTeX equations when available |
| Not screenshotting paper figures | Must capture architecture and main result figures |
| Rendering entire PDF pages as images | Use pymupdf4llm write_images=True for automatic precise extraction |
| Misplaced image insertion | Images should be adjacent to corresponding text |
| Vague critiques | Must name specific limitations (scenario, data, assumptions) |
| Wrong paper type classification | Read abstract and intro fully before classifying; default to Empirical |
| Giving up after screenshot failure | Use pymupdf4llm auto-extraction first, fall back to manual pymupdf clip |
| Writing only "what" without "why" | Every design choice should explain the motivation and justification |
| Mixing results and conclusions | Separate experimental facts (Results) from author interpretation (Analysis) |
| Missing related work positioning | Must compare against 2-3 closest prior works |
| Key Insight too vague or missing | Key Insight must be a specific, actionable new idea |
| Evaluation missing three perspectives | Separately write authors' conclusion, personal assessment, overall evaluation |
| Not distinguishing author-acknowledged vs self-discovered limitations | Critical Analysis must separate the two types of limitations |