Download, split, and deeply read academic PDFs. Use when asked to read, review, or summarize an academic paper. Splits PDFs into 4-page chunks, reads them in small batches, and produces structured reading notes — avoiding context window crashes and shallow comprehension.
CRITICAL RULE: Never read a full PDF. Never. Only read the 4-page split files, and only 3 splits at a time (~12 pages). Reading a full PDF will either crash the session with an unrecoverable "prompt too long" error — destroying all context — or produce shallow, hallucinated output. There are no exceptions.
The user wants you to read, review, or summarize an academic paper. The input is either:
./articles/smith_2024.pdf)"Gentzkow Shapiro Sinkinson 2014 competition newspapers")Important: You cannot search for a paper you don't know exists. The user MUST provide either a file path or a specific search query — an author name, a title, keywords, a year, or some combination that identifies the paper. If the user invokes this skill without specifying what paper to read, ask them. Do not guess.
If a local file path is provided:
./articles/, copy it there (do not move — preserve the original location)If a search query or paper title is provided:
./articles/ in the project directory (create the directory if needed)CRITICAL: Always preserve the original PDF. The downloaded or provided PDF in ./articles/ must NEVER be deleted, moved, or overwritten at any point in this workflow. The split files are derivatives — the original is the permanent artifact.
Create a subdirectory for the splits and run the splitting script:
from PyPDF2 import PdfReader, PdfWriter
import os, sys
def split_pdf(input_path, output_dir, pages_per_chunk=4):
os.makedirs(output_dir, exist_ok=True)
reader = PdfReader(input_path)
total = len(reader.pages)
prefix = os.path.splitext(os.path.basename(input_path))[0]
for start in range(0, total, pages_per_chunk):
end = min(start + pages_per_chunk, total)
writer = PdfWriter()
for i in range(start, end):
writer.add_page(reader.pages[i])
out_name = f"{prefix}_pp{start+1}-{end}.pdf"
out_path = os.path.join(output_dir, out_name)
with open(out_path, "wb") as f:
writer.write(f)
print(f"Split {total} pages into {-(-total // pages_per_chunk)} chunks in {output_dir}")
Directory convention:
articles/
├── smith_2024.pdf # original PDF — NEVER DELETE THIS
└── split_smith_2024/ # split subdirectory
├── smith_2024_pp1-4.pdf
├── smith_2024_pp5-8.pdf
└── ...
If PyPDF2 is not installed, install it: pip install PyPDF2
Before committing to a full read, triage the paper from its first split (pages 1-4).
Read only the first split file (abstract + introduction). Then make a relevance judgment:
| Score | Decision | Action |
|---|---|---|
| 4-5 | HIGH relevance | Proceed to full continuous read (Step 3) |
| 3 | MEDIUM relevance | Proceed to full read, but flag as "background cite only" |
| 1-2 | LOW relevance | Write a SHORT triage note to notes.md and STOP. |
Triage criteria — score relative to your study [defined in study-parameters.md]:
Read exactly 3 split files at a time (~12 pages). After each batch:
notes.md in the split subdirectory)CONTINUOUS MODE: Read all batches without stopping. The only pause point is AFTER all splits are read, when the notes file is finalized.
As you read, collect information along these dimensions and write them into notes.md:
Output is notes.md in the split subdirectory:
articles/split_smith_2024/notes.md
Updated incrementally after each batch. By the time all splits are read, notes should contain specific data sources, variable names, sample sizes, coefficient estimates, and standard errors.