Split and deeply read academic .tex files. Use when asked to read, review, or summarize an academic paper from a .tex file. Splits .tex files into chunks, reads them in small batches, and produces structured reading notes — avoiding context window crashes and shallow comprehension.
CRITICAL RULE: Never read a full .tex file. Never. Only read the split chunks, and only 3 chunks at a time. Reading a full .tex file will either crash the session with an unrecoverable "prompt too long" error — destroying all context — or produce shallow, hallucinated output. There are no exceptions.
The user wants you to read, review, or summarize an academic paper from a .tex file. The input MUST be a file path to a local .tex file (e.g., ./articles/smith_2024.tex).
If they want to do the same for the supplementary materials, the path is ./articles/SI.tex. Pay the same focus to SI as the main tex.
Important: This skill only works with provided file paths. If no file path is provided, do not proceed. Do not search for or download .tex files.
./articles/, copy it there (do not move — preserve the original location)./articles/split_main, proceed to Step 3. If not, proceed to Step 2SI, where the subdirectory to be created is ./articles/split_SICRITICAL: Always preserve the original .tex file. The provided .tex file in ./articles/ must NEVER be deleted, moved, or overwritten at any point in this workflow. The split files are derivatives — the original is the permanent artifact. Do not clean up, do not remove, do not tidy. The original stays.
Create a subdirectory for the splits and run the splitting script using texsoup:
from TexSoup import TexSoup
import os
import re
def split_tex(input_path, output_dir, approx_lines_per_chunk=200):
os.makedirs(output_dir, exist_ok=True)
with open(input_path, 'r', encoding='utf-8', errors='ignore') as f:
all_lines = f.readlines()
# Find the start of the abstract
start_index = 0
for i, line in enumerate(all_lines):
if '\\begin{abstract}' in line:
start_index = i
break
# Filter lines: remove comments and content within \begin{comment} \end{comment}, start from abstract
lines = []
in_comment = False
for line in all_lines[start_index:]:
stripped = line.strip()
if stripped.startswith('%'):
continue
if '\\begin{comment}' in line:
in_comment = True
continue
if '\\end{comment}' in line:
in_comment = False
continue
if not in_comment:
lines.append(line)
# Parse with TexSoup to get section names
content = ''.join(lines)
soup = TexSoup(content)
sections = soup.find_all('section')
section_names = [str(sec.args[0]) if sec.args else 'Unnamed' for sec in sections]
# Split into section blocks
section_indices = [i for i, line in enumerate(lines) if re.match(r'\\section', line.strip())]
section_blocks = []
start = 0
for end in section_indices[1:] + [len(lines)]:
section_blocks.append(lines[start:end])
start = end
# Group section blocks into chunks of approx_lines_per_chunk
chunks = []
current_chunk = []
current_lines = 0
for block in section_blocks:
block_lines = len(block)
if current_lines + block_lines > approx_lines_per_chunk and current_lines > 0:
chunks.append(current_chunk)
current_chunk = block[:]
current_lines = block_lines
else:
current_chunk.extend(block)
current_lines += block_lines
if current_chunk:
chunks.append(current_chunk)
# Write chunks
prefix = os.path.splitext(os.path.basename(input_path))[0]
for i, chunk in enumerate(chunks, 1):
chunk_content = ''.join(chunk)
out_name = f"{prefix}_chunk{i}.tex"
out_path = os.path.join(output_dir, out_name)
with open(out_path, 'w', encoding='utf-8') as f:
f.write(chunk_content)
print(f"Created chunk {i}: {len(chunk)} lines")
print(f"Split into {len(chunks)} chunks in {output_dir}")
print(f"Section names: {section_names}")
Directory convention:
articles/
├── smith_2024.tex # original .tex — NEVER DELETE THIS
└── split_smith_2024/ # split subdirectory
├── smith_2024_chunk1.tex
├── smith_2024_chunk2.tex
├── smith_2024_chunk3.tex
└── ...
The original .tex remains in articles/ permanently. The splits are working copies. If anything goes wrong, you can always re-split from the original.
If TexSoup is not installed, install it: pip install TexSoup
Read exactly 3 split files at a time. After each batch:
notes.md in the split subdirectory)"I have finished reading chunks [X-Y] and updated the notes. I have [N] more chunks remaining. Would you like me to continue with the next 3?"
Do NOT read ahead. Do NOT read all chunks at once. The pause-and-confirm protocol is mandatory.
As you read, collect information along these dimensions and write them into notes.md:
These questions extract what a researcher needs to build on or replicate the work — a structured extraction more detailed and specific than a typical summary.
The output is notes.md in the split subdirectory:
articles/split_smith_2024/notes.md
This file is updated incrementally after each batch. Structure it with clear headers for each of the 8 dimensions. After each batch, update whichever dimensions have new information — do not rewrite from scratch.
By the time all chunks are read, the notes should contain specific data sources, variable names, equation references, sample sizes, coefficient estimates, and standard errors. Not a summary — a structured extraction.
| Step | Action |
|---|---|
| Prepare | Verify and copy .tex to ./articles/ |
| Split | Chunks into ./articles/split_<name>/ |
| Read | 3 chunks at a time, pause after each batch |
| Write | Update notes.md with structured extraction |
| Confirm | Ask user before continuing to next batch |
For detailed explanation of why this method works, see methodology.md.