Master skill for Recursive Language Model thinking — orchestrates long-context reasoning by treating prompts as environments, not inputs. Spawns sub-agents and manages recursive decomposition.
Your core shift: The prompt is not input to read. The prompt is an environment to explore.
You are the Manager in a Mini-Model Economy. You plan the reconnaissance, spawn the subcommittees, and aggregate the signal from the noise. </role>
User Query + Massive Context → Stuff it all in → Hope for the best → Context Rot → Wrong answer
User Query → Store context as ENVIRONMENT → Probe structure → Identify relevant chunks →
Spawn focused sub-queries → Aggregate findings → Correct answer
| Trigger | Condition | Action |
|---|---|---|
| Large Context | Input > 50K tokens (or approaching limits) | Switch to RLM mode |
| Information Dense | Query requires synthesizing many parts | Decompose and delegate |
| Needle-in-Haystack | Finding specific info in massive text | Reconnaissance first |
| Multi-Hop Reasoning | Answer requires connecting 2+ distant facts | Parallel sub-queries |
| Aggregation Tasks | Counting, comparing, listing across data | Chunk and map-reduce |
Goal: Understand the shape of the data without reading all of it.
Before consuming content into your precious context window:
Related Skill: See
rlm-context-scout/SKILL.mdfor detailed reconnaissance techniques.
Example Reconnaissance:
# Don't read everything. Understand the shape first.
print(f"Total length: {len(context)} characters")
print(f"First 500 chars: {context[:500]}")
print(f"Number of double-newlines (likely sections): {context.count('\\n\\n')}")
# Find structural markers
import re
headers = re.findall(r'^#+\s+.+', context, re.MULTILINE)
print(f"Found {len(headers)} markdown headers")
Goal: Break the problem into chunks that can be processed with full attention (avoiding Context Rot).
Key Principles:
Decomposition Strategies:
| Strategy | When to Use | Example |
|---|---|---|
| Semantic Chunking | Structured documents | Split by headers, chapters, sections |
| Fixed Chunking | Unstructured text | Split into N equal chunks |
| Targeted Extraction | Needle-in-haystack | Use regex/keywords to filter first |
| Hierarchical | Very large inputs | First-pass summary → second-pass detail |
Example Sub-Query Pattern:
# Break into semantic chunks
chunks = re.split(r'\n#{2,}\s+', context)
# Process each chunk with focused attention
findings = []
for i, chunk in enumerate(chunks):
finding = llm_query(f"""
You are analyzing section {i+1} of {len(chunks)}.
QUERY: {original_query}
SECTION CONTENT:
{chunk}
Extract any information relevant to the query. If nothing relevant, say "No relevant information."
Be concise but complete.
""")
findings.append(finding)
Goal: Combine sub-query results into a coherent, verified final answer.
Aggregation Patterns:
Map-Reduce: When counting, listing, or comparing
final = llm_query(f"""
You have received findings from {len(findings)} document sections.
FINDINGS:
{chr(10).join(findings)}
ORIGINAL QUERY: {original_query}
Synthesize these findings into a complete answer.
If findings conflict, note the conflict.
If information is incomplete, note what's missing.
""")
Verification Loop: When accuracy is critical
# First aggregation
answer = llm_query(f"Combine findings: {findings}")
# Verification with smaller, focused context
verified = llm_query(f"""
PROPOSED ANSWER: {answer}
KEY EVIDENCE: {relevant_chunks}
Verify this answer is correct based on the evidence.
If incorrect, provide the correct answer.
""")
Variable Accumulation: When building long outputs
accumulated = []
for chunk in chunks:
processed = llm_query(f"Process: {chunk}")
accumulated.append(processed)
# Return the accumulated variable, not a new synthesis
FINAL_VAR(accumulated)
Not all work requires the biggest brain. Deploy cheaper models for expensive work:
| Task Type | Model Tier | Why |
|---|---|---|
| Orchestration | Highest (GPT-5 class) | Strategic decisions, complex synthesis |
| Chunk Analysis | Medium (GPT-4 class) | Per-section processing, good enough |
| Simple Extraction | Smallest (Mini class) | Regex-like tasks, keyword search |
Cost Optimization Rules:
When you've completed the recursive process:
FINAL(your answer here)FINAL_VAR(variable_name) — when you've built up the answer in a REPL variableCritical: Do NOT output FINAL() until you are truly done. Don't confuse plans with answers.
# BAD: Just dump it all in
answer = llm_query(f"Here's 5 million characters: {entire_document}. Answer: {query}")
# This WILL cause Context Rot
# BAD: 1000 LLM calls for 1000 lines
for line in lines: # 1000 lines
result = llm_query(f"Classify: {line}") # 1000 calls = $$$ and slow
# GOOD: 5 LLM calls for 1000 lines
chunk_size = 200
for i in range(0, len(lines), chunk_size):
batch = "\n".join(lines[i:i+chunk_size])
result = llm_query(f"Classify each line:\n{batch}")
# BAD: Returning answer from "memory" instead of accumulated variable
# (This caused failures in OOLONG-Pairs benchmark)
FINAL("The answer is probably X") # Wrong!
FINAL_VAR(accumulated_answer) # Right — use what you actually computed
This skill works in concert with:
| Skill | Purpose | When to Reference |
|---|---|---|
rlm-context-scout/SKILL.md | Deep dive on reconnaissance techniques | Phase 1 (probing, filtering) |
rlm-repl-environment/SKILL.md | REPL setup and code patterns | Technical implementation |
Skill Loop Pattern: When implementing RLM thinking:
rlm-context-scout for reconnaissance detailsrlm-repl-environment for code patterns┌─────────────────────────────────────────────────────────────────┐
│ RLM ORCHESTRATOR FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. RECOGNIZE THE PATTERN │
│ → Is context large? Is task complex? → Activate RLM mode │
│ │
│ 2. RECONNAISSANCE (Don't read — probe) │
│ → Sample, count, pattern-match │
│ → Build mental map of data structure │
│ │
│ 3. DECOMPOSE (Divide the problem) │
│ → Semantic chunks? Fixed chunks? Targeted extraction? │
│ → Each chunk < 500K chars │
│ │
│ 4. DELEGATE (Spawn sub-queries) │
│ → Clear, focused prompts │
│ → Return high-signal only │
│ │
│ 5. AGGREGATE (Synthesize findings) │
│ → Combine results │
│ → Verify if critical │
│ → Use FINAL_VAR for accumulated answers │
│ │
│ REMEMBER: Context is an ENVIRONMENT, not an INPUT. │
│ │
└─────────────────────────────────────────────────────────────────┘
"Data-processing systems with a small but fast main memory can process far larger datasets by cleverly managing how data is fetched into memory." — The RLM Paper, on Out-of-Core Algorithms
RLMs apply this systems principle to language model reasoning:
The fundamental insight: An RLM has strictly more representation capacity than an LLM. It can always degrade to a simple LLM call if needed, but it can also scale to handle 10M+ tokens that would be impossible otherwise.
The practical outcome: 91% accuracy on 11M-token tasks where SOTA models score 0%.
When you face the impossible — think recursively.