Name: Wiki Aggregate — N raw sources → 1 structured pack
Author: LichAmnesia

Wiki Aggregate — N raw sources → 1 structured pack

Use when you have N≥3 raw research artifacts (notes, podcast summaries, deep-research dumps, daily intel, paper analyses) on one topic and want to lift them into a single structured pack with cross-source claims and provenance — instead of one-shot summarization that loses 90% of intermediate evidence. Treats the N sources as an environment a lite aggregator agent navigates with `inspect` / `search` / `synthesize` tools, rather than concatenating into one prompt.

LichAmnesia169 스타2026. 4. 17.

직업
카테고리: 데이터 분석

A protocol for agentic aggregation of long-horizon research material. Inverts the standard "concat all → ask LLM to summarize" pipeline: instead, an aggregator agent navigates the N source files with three tools, building a notes scratchpad with full path:line provenance, and finally writes a structured pack (brief / findings / sources / aggregation log).

Core principle: don't read everything upfront. Don't merge final answers. Treat the N sources as a queryable environment.

Why this exists

Three traditional ways to aggregate N parallel research outputs all fail on long-horizon, open-ended tasks:

  ❌ concat all sources into one prompt
       → 200K+ token explosion, attention collapse on long context

  ❌ summarize each, then merge summaries
       → ~90% of intermediate evidence (the "I noticed X but..." asides) is lost

  ❌ LLM-as-judge picks the single best source
       → discards the other N-1 sources' independent findings

These failure modes show up clearly on open-ended research tasks where there's no ground-truth verifier. The alternative: treat the N sources as an , send a in to on demand. Cost ≈ a single rollout, recall is materially higher, and cross-source contradictions get surfaced explicitly.

Wiki Aggregate — N raw sources → 1 structured pack

LichAmnesia169 스타2026. 4. 17.

직업
카테고리: 데이터 분석

Why this exists

Three traditional ways to aggregate N parallel research outputs all fail on long-horizon, open-ended tasks:

❌ concat all sources into one prompt → 200K+ token explosion, attention collapse on long context ❌ summarize each, then merge summaries → ~90% of intermediate evidence (the "I noticed X but..." asides) is lost ❌ LLM-as-judge picks the single best source → discards the other N-1 sources' independent findings

Trajectories-as-environment ╔════════════════════════════════════════╗ ║ ║ ║ src_1 src_2 src_3 ... src_N ║ ║ [..] [..] [..] [..] ║ ║ [..] [..] [..] [..] ║ ║ ║ ╚═══════════════════╤════════════════════╝ │ │ not concatenated. │ not summarized. │ navigated. │ ▼ ┌────────────────────────────────────────┐ │ AGGREGATOR (lite agent) │ │ ┌──────────────────────────────────┐ │ │ │ inspect_file / inspect_section │ │ │ │ search_sources │ │ │ │ cross_pack_check │ │ │ └──────────────────────────────────┘ │ │ │ │ scratch state: │ │ notes = [] # {claim, evidence, │ │ # source, line_ref} │ │ budget = 25 # tool calls │ │ subtopics = derived from skim pass │ │ │ │ loop until: subtopic coverage met, │ │ OR budget = 0, │ │ OR 2 zero-info calls │ └───────────────────┬────────────────────┘ │ ▼ ┌─────────────────────────────┐ │ pack/ │ │ brief.md │ │ findings.md ← claims │ │ sources.tsv ← S-IDs │ │ _aggregation_log.md │ └─────────────────────────────┘

Verb	Implementation	When to use
`inspect_file(path)`	`Read` whole file	Source < 200 LOC and you need full content
`inspect_section(path, line_range)`	`Read` with `offset` + `limit`	Drilling into a specific span of a long source
`search_sources(pattern)`	`Grep` over the N source paths only	Finding a keyword / theme across sources
`cross_pack_check(pattern)`	`Grep` over your wider knowledge base, excluding the target pack and the raw sources	Avoiding duplicate claims with existing packs

Excuse the agent will invent	Rebuttal
"I'll just read all N files in Phase 2 to be safe"	That's the V1 mistake this skill exists to fix. Long-context attention degrades; you'll lose information you "read." Stay disciplined: cheap-pass first, drill on demand.
"Skipping `cross_pack_check` — it's a small repo"	Repos grow. Duplicate claims accumulate silently. One `Grep` per novel claim costs almost nothing.
"I have a great quote but I don't remember the line number"	Then the note is invalid. Re-`Read` to get `path:L<lines>`. No provenance, no claim — refuse to write `findings.md` if any note is missing.
"Only 2 sources matched the glob — I'll proceed anyway"	No. Hard stop at N < 3. Either collect more or write a summary by hand. The protocol overhead is wasted on small N.
"All sources got 'low yield' — I'll write findings from my prior knowledge"	No. The pack is supposed to reflect what's in the sources. If yield is low, the brief is empty + log says so. Don't fabricate.
"I'll skip writing `_aggregation_log.md`, it's just paperwork"	No. The log is what makes the next run reproducible. It's also the audit trail when someone questions a claim months later.

Wiki Aggregate — N raw sources → 1 structured pack

Why this exists

Wiki Aggregate — N raw sources → 1 structured pack

Why this exists

When to Use

When NOT to Use

The Aggregation Loop

Process

Phase 1: Scope

Phase 2: Cheap-Pass (Skim)

Phase 3: Aggregator Loop (budget = 25 by default)

Phase 4: Write the Pack

Phase 5: Index hint

Phase 6: Report

Anti-rationalizations

Verification

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling

Wiki Aggregate — N raw sources → 1 structured pack

Why this exists

Wiki Aggregate — N raw sources → 1 structured pack

Why this exists

When to Use

When NOT to Use

The Aggregation Loop

Process

Phase 1: Scope

Phase 2: Cheap-Pass (Skim)

Phase 3: Aggregator Loop (budget = 25 by default)

Phase 4: Write the Pack

Phase 5: Index hint

Phase 6: Report

Anti-rationalizations

Verification

Related skills in this repo

Visualization Expert

Data Analyst

Huggingface Hub

Multi Reviewer Patterns

Dbt Transformation Patterns

Startup Financial Modeling