Recursive Language Model (RLM) skill for processing arbitrarily large inputs. Based on "Recursive Language Models" (Zhang, Kraska & Khattab, MIT CSAIL, 2026). Use when inputs exceed context limits or require processing all/most of a large file. Implements Algorithm 1: InitREPL → code generation → execution → until FINAL_ANSWER. Two modes: Full RLM (with sub-LM calls via Task tool) and RLM-lite (REPL-only, no sub-calls).
Based on Recursive Language Models — Zhang, Kraska & Khattab (MIT CSAIL, 2026)
procedure RLM(prompt P, query Q)
E ← InitREPL() # E = Bash/Python environment
E.set("context", P) # prompt lives as VARIABLE, not in attention
E.set("query", Q) # query also externalized
loop:
code ← LLM.generate(E.state) # you write code based on env state
output ← E.execute(code) # Bash/Python runs it
if "FINAL_ANSWER" in E.vars:
return E.get("FINAL_ANSWER")
end loop
end procedure
You are the root LLM. Your REPL is Bash + Python. Your sub-LM calls are the Task tool. Your variables are files on disk.
Never load large inputs into your context window. The prompt is a file path — a variable in the environment. You interact with it through code.
# WRONG: Read the whole file into context
cat huge_file.txt
# RIGHT: Keep it external, probe with code
wc -l huge_file.txt
head -50 huge_file.txt
grep -c "error" huge_file.txt
Use the Task tool as your lm_query() function. Each sub-agent processes a chunk and returns structured results to a file — not to your context.
lm_query(chunk, question) → Task tool sub-agent → writes result to file
Never try to hold all results in context. Write everything to files. Build the final answer by aggregating file contents programmatically.
# Sub-agent writes: results/chunk_0001.json
# Sub-agent writes: results/chunk_0002.json
# You aggregate with code, then read the small summary
| Condition | Mode |
|---|---|
| Single file > 50KB | RLM-lite (no sub-calls) |
| Collection > 200KB | Full RLM |
| Needle-in-haystack (find specific item) | RLM-lite with rlm_search.py |
| Linear scan (process everything once) | Full RLM, Map-Reduce |
| Pairwise comparison (O(N²)) | Full RLM, Classify-Then-Compute |
| Input > 1M tokens | Full RLM, Hierarchical |
Don't use RLM for files < 50KB or tasks where a single grep answers the question.
Before writing any processing code, understand what you're working with.
# Size and structure
wc -l -c INPUT_PATH
file INPUT_PATH
head -100 INPUT_PATH
# For directories
find DIR -type f -name "*.EXT" | wc -l
find DIR -type f | xargs wc -c | sort -rn | head -20
# For structured data
python3 -c "import json; d=json.load(open('f.json')); print(type(d), len(d))"
Output: You now know size, format, structure. Set your strategy.
This is the biggest efficiency win. Use code to eliminate irrelevant content BEFORE any LLM processing. The paper found that model priors (keyword filtering) dramatically reduce cost.
# Fast regex pre-filter
python3 scripts/rlm_search.py INPUT "keyword1|keyword2" --context 5
# Structure-based filter
grep -rl "class.*Controller" ./src/
find ./repo -name "*.py" -path "*/api/*"
# Statistical filter
python3 -c "
import re
with open('INPUT') as f: text = f.read()
sections = re.split(r'\n## ', text)
relevant = [s for s in sections if 'TARGET_TOPIC' in s.lower()]
print(f'Filtered {len(sections)} → {len(relevant)} sections')
for i, s in enumerate(relevant):
open(f'filtered_{i}.txt', 'w').write(s)
"
# Auto-detect best strategy
python3 scripts/rlm_chunker.py auto INPUT_PATH --output ./chunks
# Manual control
python3 scripts/rlm_chunker.py file large.txt --method lines --size 500
python3 scripts/rlm_chunker.py file doc.md --method separator --sep "\n## "
python3 scripts/rlm_chunker.py dir ./src --ext .py .js --size 60000
Target: 30K–80K chars per chunk for sub-agent calls. Check chunks/manifest.json for the plan.
This is lm_query() from Algorithm 1. Use the Task tool to process chunks in parallel.
Critical rules:
Sub-agent prompt template:
Read the file at {chunk_path}.
TASK: {specific_question}
Write your answer as JSON to {result_path}:
{
"chunk_index": N,
"findings": ["finding1", "finding2"],
"summary": "1-2 sentence summary",
"relevant": true/false
}
If nothing relevant, set "relevant": false and "findings": [].
For RLM-lite (no sub-calls): Skip this step. Use rlm_search.py regex results directly.
# Collect all results
python3 scripts/rlm_aggregate.py ./results --output final_answer.json
# Or manually
python3 -c "
import json, glob
results = []
for p in sorted(glob.glob('./results/*.json')):
results.append(json.load(open(p)))
relevant = [r for r in results if r.get('relevant')]
findings = [f for r in relevant for f in r.get('findings', [])]
json.dump(findings, open('aggregated.json', 'w'), indent=2)
print(f'{len(relevant)}/{len(results)} chunks relevant, {len(findings)} findings')
"
Then do a final synthesis (as sub-agent if aggregated results > 30K chars, or in your own context if small enough):
ORIGINAL QUERY: {query}
AGGREGATED FINDINGS (from {N} chunks):
{findings}
Synthesize a comprehensive answer.
Write the final answer to a file — this is FINAL_ANSWER from Algorithm 1.
Probe → Chunk → Map (parallel sub-agents) → Reduce (aggregate) → FINAL_ANSWER
Best for: summarization, classification, extraction across all content.
Probe → Filter (regex/grep) → Process (sub-agent on filtered content) → FINAL_ANSWER
Best for: finding specific information. RLM-lite often sufficient here.
Probe → Chunk → Classify (sub-agents label each) → Compute pairs (code) → FINAL_ANSWER
Best for: cross-referencing, relationship finding, comparisons.
Probe → Mega-chunk (~10 groups) → Sub-agent per mega-chunk (each further chunks internally) → Aggregate → FINAL_ANSWER
Best for: inputs so large that chunking alone produces 100+ pieces.
RLM costs are comparable to base model at the median but have high variance at the tail due to long trajectories.
Add this to any project to activate RLM:
## Large Input Processing (RLM)
When inputs exceed 50KB or tasks require processing all/most of a large file:
1. Prompt is a VARIABLE (file path), not in context
2. Probe first: wc, head, file, find
3. Filter with grep/regex before any LLM processing
4. Chunk into 30K-80K char pieces (scripts/rlm_chunker.py)
5. Delegate semantic work to sub-agents (Task tool = lm_query)
6. Store all intermediate results in files, never context
7. Aggregate programmatically → FINAL_ANSWER in a file
8. Code for deterministic work, sub-agents for semantic work only