Use when you have N≥3 raw research artifacts (notes, podcast summaries, deep-research dumps, daily intel, paper analyses) on one topic and want to lift them into a single structured pack with cross-source claims and provenance — instead of one-shot summarization that loses 90% of intermediate evidence. Treats the N sources as an environment a lite aggregator agent navigates with `inspect` / `search` / `synthesize` tools, rather than concatenating into one prompt.
A protocol for agentic aggregation of long-horizon research material. Inverts the standard "concat all → ask LLM to summarize" pipeline: instead, an aggregator agent navigates the N source files with three tools, building a notes scratchpad with full path:line provenance, and finally writes a structured pack (brief / findings / sources / aggregation log).
Core principle: don't read everything upfront. Don't merge final answers. Treat the N sources as a queryable environment.
Three traditional ways to aggregate N parallel research outputs all fail on long-horizon, open-ended tasks:
❌ concat all sources into one prompt
→ 200K+ token explosion, attention collapse on long context
❌ summarize each, then merge summaries
→ ~90% of intermediate evidence (the "I noticed X but..." asides) is lost
❌ LLM-as-judge picks the single best source
→ discards the other N-1 sources' independent findings
These failure modes show up clearly on open-ended research tasks where there's no ground-truth verifier. The alternative: treat the N sources as an , send a in to on demand. Cost ≈ a single rollout, recall is materially higher, and cross-source contradictions get surfaced explicitly.
environmentlite agentinspect / search / synthesizeThis skill is the protocol. No Python, no MCP — pure markdown protocol that any harness with Read + Grep can execute.
path:line provenance, not vibeswiki-ask-style skill instead Trajectories-as-environment
╔════════════════════════════════════════╗
║ ║
║ src_1 src_2 src_3 ... src_N ║
║ [..] [..] [..] [..] ║
║ [..] [..] [..] [..] ║
║ ║
╚═══════════════════╤════════════════════╝
│
│ not concatenated.
│ not summarized.
│ navigated.
│
▼
┌────────────────────────────────────────┐
│ AGGREGATOR (lite agent) │
│ ┌──────────────────────────────────┐ │
│ │ inspect_file / inspect_section │ │
│ │ search_sources │ │
│ │ cross_pack_check │ │
│ └──────────────────────────────────┘ │
│ │
│ scratch state: │
│ notes = [] # {claim, evidence, │
│ # source, line_ref} │
│ budget = 25 # tool calls │
│ subtopics = derived from skim pass │
│ │
│ loop until: subtopic coverage met, │
│ OR budget = 0, │
│ OR 2 zero-info calls │
└───────────────────┬────────────────────┘
│
▼
┌─────────────────────────────┐
│ pack/ │
│ brief.md │
│ findings.md ← claims │
│ sources.tsv ← S-IDs │
│ _aggregation_log.md │
└─────────────────────────────┘
brief.md + findings.md so you know what already exists.For each source, do one cheap read:
Build an in-memory source map:
S1 | path/to/source_1.md | what it covers (1-2 lines) | rough_topics
S2 | path/to/source_2.md | ... | ...
This pass costs ~N reads, each bounded. Do not skip — the source map is what makes Phase 3 efficient.
Tool inventory (use whatever your harness provides — Read, Grep are sufficient):
| Verb | Implementation | When to use |
|---|---|---|
inspect_file(path) | Read whole file | Source < 200 LOC and you need full content |
inspect_section(path, line_range) | Read with offset + limit | Drilling into a specific span of a long source |
search_sources(pattern) | Grep over the N source paths only | Finding a keyword / theme across sources |
cross_pack_check(pattern) | Grep over your wider knowledge base, excluding the target pack and the raw sources | Avoiding duplicate claims with existing packs |
Loop discipline:
state.notes = []
state.budget = 25 (or user-specified)
while state.budget > 0:
pick highest-value next action:
drill — a subtopic has a hot lead in one source
cross_search — a claim from S1 should be cross-checked against others
dedup_check — a claim looks novel; verify no existing pack covers it
resolve — two sources disagree; inspect both passages
explore — a subtopic has zero notes after Phase 2; broaden search
DONE — coverage threshold met
record note → {claim, evidence_quote, source_id, line_ref, confidence}
state.budget -= 1
Stopping criteria — declare DONE when ANY holds:
Hard rule: every note MUST have a source_id + line_ref (path + line range).
No provenance, no claim. This is what makes the pack auditable.
Output location: <pack-name>/. If updating, merge with existing files (preserve original sources for existing claims, add new claims, flag superseded ones).
Files:
brief.md — 200-400 word executive overview. Subtopic skeleton. Reading order suggestion.
findings.md — claims, one block per finding, grouped under subtopic headers:
## Claim: <one-line claim>
Status: supported | contradicted | uncertain
Confidence: high | medium | low
Sources: S1, S3, S7
Evidence:
- "exact quote or paraphrase" — S1 (path/to/source.md:L120-128)
- "..." — S3 (path/to/other.md:L45-50)
Notes: <optional — e.g., "S3 contradicts S7 on date">
sources.tsv — S-ID mapping:
id path type captured_at url_or_origin
S1 path/to/source_1.md podcast-notes 2026-04-12 https://...
S2 path/to/source_2.md daily-intel 2026-04-13 ...
_aggregation_log.md — always written. Audit trail:
# Aggregation Log
Date: YYYY-MM-DD
Topic: <topic>
Sources: N=<N>
Tool calls: X / budget Y
Cross-pack overlaps: <list or "none">
Subtopics covered: <list>
Skipped sources (no relevant content): <list>
Stopping criterion triggered: <which one>
Append a one-line entry to your pack index (do not trigger a full reindex — that's a different skill's job).
Print:
Pack written: <pack-name>/
Sources processed: N
Aggregator tool calls: X / budget Y
Subtopics: K
Claims extracted: M (high: a, medium: b, low: c)
Cross-pack overlaps: <list or "none">
Sources with low yield: <list>
Suggested next: <reindex command> && <lint command>
| Excuse the agent will invent | Rebuttal |
|---|---|
| "I'll just read all N files in Phase 2 to be safe" | That's the V1 mistake this skill exists to fix. Long-context attention degrades; you'll lose information you "read." Stay disciplined: cheap-pass first, drill on demand. |
"Skipping cross_pack_check — it's a small repo" | Repos grow. Duplicate claims accumulate silently. One Grep per novel claim costs almost nothing. |
| "I have a great quote but I don't remember the line number" | Then the note is invalid. Re-Read to get path:L<lines>. No provenance, no claim — refuse to write findings.md if any note is missing. |
| "Only 2 sources matched the glob — I'll proceed anyway" | No. Hard stop at N < 3. Either collect more or write a summary by hand. The protocol overhead is wasted on small N. |
| "All sources got 'low yield' — I'll write findings from my prior knowledge" | No. The pack is supposed to reflect what's in the sources. If yield is low, the brief is empty + log says so. Don't fabricate. |
"I'll skip writing _aggregation_log.md, it's just paperwork" | No. The log is what makes the next run reproducible. It's also the audit trail when someone questions a claim months later. |
A successfully completed run produces:
<pack-name>/brief.md exists, ≤ 400 words, organized by subtopic<pack-name>/findings.md exists; every ## Claim: block has ≥1 Evidence: line with path:L<lines> provenance<pack-name>/sources.tsv exists with N rows matching N sources<pack-name>/_aggregation_log.md exists with tool-call count and stopping reasonIf any checkbox fails, the run is incomplete — do not declare DONE.
debug-hypothesis — same disciplined-loop pattern, applied to bug investigation rather than research synthesisspec-driven-dev — same explicit-exit-criteria philosophy, applied to building software end-to-end