Conducts massive-scale, multi-layered research using a 3-tier model strategy (Haiku pre-screening, Sonnet data collection, Opus synthesis) with hierarchical tree-reduction to process hundreds of sources without overloading any single agent. Accepts optional /deep-research output as bootstrap context. Configurable depth levels: standard (~1hr, ~40 sources), deep (~2hr, ~100 sources), or exhaustive (~4hr, ~300 sources). Produces comprehensive cited reports with per-finding confidence levels, cross-source contradiction analysis, circular sourcing detection, and source quality scoring. Use when the user says "exhaustive research", "massive research", "scale up research", "go deeper", "research everything about", "exhaustive analysis", or needs research far beyond what /deep-research provides.
You are conducting massive-scale research that processes hundreds of sources through a hierarchical tree of agents. This skill uses a 3-tier model strategy: Haiku for cheap pre-screening, Sonnet for data collection and tree merging, Opus for final synthesis and adversarial review.
Rate source credibility using references/source-evaluation.md. Use depth parameters from references/depth-config.md. Use tree-reduction algorithm from references/tree-reduction.md. Use screening rubric from references/screening-rubric.md. Use report template from references/report-template.md. Use checkpointing and progress reporting from .
The orchestrator's context window fills rapidly during Exhaustive runs (~1,001K tokens without mitigation — overflows 1M context). To prevent overflow and "lost in the middle" quality degradation:
1. Save-and-release pattern: After each agent wave completes in Phases 2-7:
batch_N.md — [1-sentence summary of key findings]2. Load-on-demand pattern: When a downstream phase needs prior reports:
3. Synthesis receives summaries, not full reports: Phase 8 Opus synthesis should receive:
4. Context budget targets (see references/depth-config.md):
Before starting, check if {save_dir}/checkpoints/ exists from a prior run:
Check if the user provided /deep-research output (pasted text or file path).
If bootstrap provided:
If no bootstrap:
Check if the user specified depth (e.g., deep: climate change). If not, ask.
Ask: "Where should I save the final report?" Offer:
./research-output/<topic-slug>-<date>/ (default — current working directory)~/Desktop/<topic-slug>-<date>/Store the confirmed path for Phase 9. Immediately create the output directory structure:
{save_dir}/mini-reports/ — Phase 4 reader outputs{save_dir}/intermediate/ — Phase 5 merge outputs{save_dir}/round2/ — Phase 7 collector + skeptic outputs{save_dir}/checkpoints/ — Phase checkpoint JSONsSmart-skip: If the user's initial prompt already specifies depth level, scope/focus, and target decision (e.g., "exhaustive research on X for Y purpose, focusing on Z"), skip clarification questions entirely and proceed to Phase 1d. Only ask questions for genuinely missing pieces. If bootstrap context is rich AND the original prompt specifies depth, skip Phase 1c.
Ask 2-4 targeted questions based on the topic and any bootstrap context (only for missing information):
If bootstrap context is rich, reduce to 2 questions (depth level + priority angles).
Wait for user response before continuing.
Decompose the topic into 6-10 research themes (more than deep-research's 4-6). For each theme, define 2-3 search perspectives (STORM persona pattern). Present the plan briefly, then proceed.
Spawn 1 Sonnet deep-researcher agent with model: "sonnet" for all depth levels. The single agent receives:
The query-gen agent prompt MUST end with: "Return only a numbered list of search query strings. Do NOT run searches yourself. Do NOT use Agent, Skill, WebSearch, or WebFetch tools. Just produce the query list."
Sub-timeout: If the query-gen agent doesn't return within 5 minutes, proceed with whatever queries the orchestrator can generate from the research plan themes directly.
Collect all queries. Deduplicate (exact match + semantic similarity). Cap at depth-level maximum from references/depth-config.md.
Run all search queries using WebSearch. Collect all result snippets (URL + title + snippet text). Deduplicate by URL (exact match). This produces the candidate pool for screening.
If the candidate pool is smaller than expected (< 50% of depth target), generate 10-20 additional queries with broader terms and search again. Do this at most once. Track this retry with a flag in the Phase 2 checkpoint: "search_retry_executed": true. On resume from Phase 2: check this flag before retrying — if already true, skip retry and proceed to Phase 3.
Zero results abort: If total search results < 10 across all queries (including retry), ABORT: "Search returned near-zero results. The topic may be too niche for web research. Suggest broadening the topic or trying a different angle."
Checkpoint: Save search results to checkpoints/phase2_search_results.json. Emit Phase Recap (see references/checkpointing.md).
Context release (Deep/Exhaustive): After saving search results to disk, keep only the deduplicated candidate list in context (URL + title + composite score — ~1 line per source). Release full snippet text from active context. Phase 3 screeners will receive snippets by reading from phase2_search_results.json.
Spawn Haiku deep-researcher agents with model: "haiku", one per batch of ~40-50 snippets. Each agent uses the condensed screening rubric from references/screening-rubric.md (the short version, not the full reference doc).
Each screener evaluates every snippet for relevance (1-10), credibility (HIGH/MEDIUM/LOW), and information density (HIGH/MEDIUM/LOW), returning a verdict: PASS, BORDERLINE, or FAIL.
Wave-batching for Exhaustive depth: At Exhaustive depth (8-12 screeners), launch agents in waves of 6. After each wave: collect all returned results, note any non-returning agents as timed-out (coverage gap), then start the next wave. For Standard and Deep depths (≤6 agents), launch all at once.
After all screeners (or waves) return:
Timeout: If screener agents don't return within phase timeout, proceed with whatever results are available. If fewer than 3 screeners returned, fall back to search-engine ranking (top N by position).
Early termination checks:
Checkpoint: Save screening verdicts to checkpoints/phase3_screening_verdicts.json. Emit Phase Recap.
Context release (Deep/Exhaustive): After saving verdicts, keep only the PASS source list in context (URLs + composite scores). Release BORDERLINE and FAIL verdict details from active context.
Early-start optimization (Deep/Exhaustive only): Phase 4 readers may begin spawning as soon as the first screening wave returns at least 5 PASS sources. Do not wait for all screeners to complete before starting the first reader wave. Track which sources have been assigned to readers to avoid double-reading. Continue screening remaining waves concurrently with reader waves.
Divide PASS sources into batches of 5-7. Each batch will be assigned to one Sonnet reader agent using the Level 0 reader prompt from references/tree-reduction.md.
Each reader:
{save_dir}/mini-reports/batch_{N}.mdWave-batching (required for Exhaustive depth; optional for Deep; not needed for Standard):
For Exhaustive depth (25-50 reader agents), launch readers in waves of 10:
For each wave of up to 10 agents:
1. Spawn all agents in the wave in a single parallel message
2. Collect results as they return
3. Once all other agents in the wave have returned, the wave is effectively complete.
Any agent that has not returned by then is considered TIMED-OUT — do not wait further.
(The non-returning agent had the entire duration of all other agents' processing to respond.)
4. Mark timed-out agents as TIMED-OUT, note coverage gap, do NOT retry now
5. Write mini-reports to disk for all completed agents in this wave
6. **Context release**: After writing mini-reports to disk, release full report text from active context.
Keep only a manifest line per report: `batch_{N}.md — [1-line summary: top claim + source count]`
Do NOT reference the full mini-report text again until Phase 5 reads it from disk.
7. Emit a wave-level progress note: "Wave [N]/[total] complete: [X]/[10] agents returned"
8. Start next wave immediately — do not wait for timed-out agents
For Standard depth (≤8 agents) and Deep depth (≤20 agents): launch all agents at once in a single message (no wave overhead needed).
At Standard depth, validation gates are OPTIONAL — the orchestrator may skip Phase 4b entirely to save cost, since downstream merge agents will catch structural issues during cross-referencing. At Deep/Exhaustive depth, validation gates remain active.
After all readers return, divide mini-reports into batches of 5. Spawn 1 Haiku deep-researcher validator per batch with model: "haiku" (Sonnet for Exhaustive depth) to validate using the validation gate prompt from references/tree-reduction.md. Validators check STRUCTURAL properties only: citation URLs present? metadata block complete? word count in expected range (200-800)? response parseable into expected sections?
Reports marked REJECT are discarded. Reports marked FLAG are kept with warnings attached. Do NOT auto-reject based on semantic quality judgments — structural checks only, since automated semantic quality detection is only ~53% accurate and false positives can cascade.
Validator sub-timeout: 5 min (Standard), 8 min (Deep), 12 min (Exhaustive). If validators don't return by sub-timeout, skip validation and mark all reports UNVALIDATED. Proceed to Phase 5.
Timeout: With wave-batching, timed-out agents are handled wave by wave (see above) — never wait indefinitely. After all waves complete, if fewer than 50% of all reader agents returned, note the coverage gap. If 0 readers return across all waves (100% failure): save partial report from prior phases, notify user: "Phase 4 complete failure — all reader agents failed. Saving partial report." Do not continue.
Do NOT retry entire batches — each agent's output file is an implicit checkpoint; only agents with missing output files need re-running, and only once.
Checkpoint: Save mini-report file paths to checkpoints/phase4_mini_reports.json. Emit Phase Recap with Agent Recap showing per-batch outcomes.
Early-start optimization (Exhaustive only): Phase 5 Level 1 merge agents may begin spawning as soon as 5 mini-reports are available from completed reader waves. Do not wait for all readers to complete. As more mini-reports complete, spawn additional merge agents. Track which mini-reports have been assigned to merge agents to avoid double-processing. Continue reader waves concurrently with merge waves.
Group mini-reports into batches of 5. For each batch:
{save_dir}/mini-reports/batch_{N}.md through batch_{N+4}.md{save_dir}/intermediate/group_{N}.mdgroup_{N}.md — [1-line summary]Each merger:
Wave-batching for Exhaustive depth: At Exhaustive depth (6-10 merge agents), launch in waves of 5. After each wave: collect results, mark non-returning agents as timed-out, write intermediate/group_{N}.md for completed agents, start next wave. For Standard and Deep depths (≤5 agents), launch all at once.
Divide intermediate reports into batches of 5. Spawn 1 Haiku deep-researcher validator per batch with model: "haiku" (Sonnet for Exhaustive) to check intermediate reports. Sub-timeout: 5 min (Standard), 8 min (Deep), 12 min (Exhaustive). If exceeded, skip validation and proceed.
If more than 5 intermediate reports remain, repeat: group into batches of 5, spawn Sonnet deep-researcher merge agents with model: "sonnet", produce condensed reports.
Hard cap: Tree depth NEVER exceeds 3 levels. If more than 5 reports remain after Level 2, pass them all directly to the Opus root synthesis in Phase 8.
If any merge agent doesn't return within phase timeout, skip it. If fewer than 50% of merge agents return, the parent concatenates available mini-reports inline as a fallback (lower quality but functional). If 0 merge agents return (100% failure): save partial report from Phase 4 data, notify user. Do not continue.
High failure check: If >30% of agents in Phase 5 fail, pause and ask: "Over 30% of merge agents failed. Continue with partial results, or abort and save what we have?"
Checkpoint: Save intermediate report file paths to checkpoints/phase5_intermediate.json. Emit Phase Recap.
Budget check: Before Phase 6, estimate total tokens consumed so far (from checkpoint agent_stats). If consumed > 60% of depth-level token budget (Standard=750K, Deep=2.25M, Exhaustive=6M), warn user: "Token consumption is tracking high ([X]% of budget used before Round 2). Options: (a) Continue — may exceed budget, (b) Skip Round 2 and synthesize now with Round 1 data, (c) Abort." This is a pause, not a hard stop.
Spawn a single deep-researcher agent with model: "sonnet" for all depth levels — to analyze all intermediate reports. (Gap analysis identifies gaps and claims for Round 2; it does not require Opus-level reasoning.)
Context-aware loading (Deep/Exhaustive): Do NOT pass all intermediate reports from context memory. Instead:
The agent's prompt MUST include the reports read from disk and these instructions:
"Analyze the intermediate research reports and identify:
Return:
IMPORTANT: You are analyzing reports already collected. Do NOT use WebSearch, WebFetch, Agent, or Skill tools. Your gap analysis is based solely on the reports provided here. Note gaps in your output — the orchestrator (not you) will dispatch Round 2 agents to fill them.
CRITICAL: You are a leaf-node agent in a pre-built research pipeline.
Timeout: If Opus doesn't return within phase timeout, skip Round 2 entirely. Proceed to Phase 8 synthesis with Round 1 data only. Note the limitation in the report.
Checkpoint: Save gap analysis and Round 2 targets to checkpoints/phase6_gap_analysis.json. Emit Phase Recap.
Based on the gap analysis, spawn two types of agents concurrently. Spawn collector waves AND skeptic agents in separate messages (never mix Opus skeptics with Sonnet collectors in the same spawn message). Skeptics receive Phase 6 claims (not collector output), so they run independently and in parallel with collectors. If a skeptic hangs, it does not block collectors, and vice versa.
Spawn 3-10 deep-researcher agents with model: "sonnet" (count based on depth level). At Exhaustive depth (8-10 collectors), use waves of 5: spawn 5, collect results (marking any non-returning agents as timed-out), then spawn the next 5. For Standard/Deep (≤6 collectors), launch all at once.
Each collector agent prompt MUST follow this template:
"Topic: [TOPIC-SLUG]. You are filling a specific research gap identified in Round 1.
GAP: [specific gap from Phase 6] WHY IT MATTERS: [reason from Phase 6] CONTEXT FROM ROUND 1: [relevant findings that should inform your search] ALREADY-READ URLs (do NOT re-fetch these): [list of Phase 4 source URLs]
Instructions:
Keep response under 800 words.
CRITICAL: You are a leaf-node agent in a pre-built research pipeline.
Spawn 1-2 deep-researcher agents with model: "opus" (2 for Exhaustive depth) in a separate message from collectors. If Opus skeptic agents don't return by the time all collector waves have completed and been processed: skip skeptics, note limitation ("Skeptic review skipped — agent did not return"), and proceed to Phase 8. Do NOT wait indefinitely for Opus.
Each skeptic receives specific claims from Phase 6 and this instruction:
"You are a skeptic. Your job is NOT to confirm — it is to challenge. For each claim:
Every challenge MUST reference a specific source URL. Unsupported challenges will be discarded. Keep response under 600 words.
CRITICAL: You are a leaf-node agent in a pre-built research pipeline.
Timeout: If Round 2 agents don't complete within phase timeout, proceed with available results. If Round 1 completed with 3+ intermediate reports, synthesis can proceed even if Round 2 fails entirely.
Checkpoint: Save collector + skeptic results to checkpoints/phase7_round2.json. Emit Phase Recap.
Spawn a single deep-researcher agent with model: "opus" to produce the final synthesis.
Context-aware loading: The agent receives (read from disk, NOT from orchestrator context memory):
checkpoints/phase6_gap_analysis.json){save_dir}/round2/collector_{N}.md){save_dir}/round2/skeptic.md)This keeps Phase 8 input to ~40-60K tokens instead of ~200K+.
"Synthesize all research into a final report. Perform:
Format each finding as: N. **[Claim]** \[Confidence: HIGH/MEDIUM/LOW]` `[Sources: N]` — [Evidence sentence] URL1 URL2Format each contradiction as a row:[Claim] | [Position A + source URL] | [Position B + source URL] | [Evidence strength] | [Circular: Yes/No] | [Resolution]`
Return: 20-40 key findings in the format above, contradiction table, knowledge gaps, and confidence assessment.
IMPORTANT: Synthesize only from the data provided here. Do NOT use WebSearch, WebFetch, Agent, or Skill tools. If you identify knowledge gaps, list them in your output — the orchestrator will handle follow-up. Do NOT attempt to fill gaps by fetching or researching further.
CRITICAL: You are a leaf-node agent in a pre-built research pipeline.
Synthesis validation: If Opus returns fewer than 3 key findings or an empty/malformed response, fall back to using the highest-level intermediate reports from Phase 5 as the basis for the report. Log: "Opus synthesis produced insufficient output; using tree-merged results instead."
Timeout / no-return fallback: If Opus synthesis does not return (rate limit, context overflow, or other failure), do NOT wait indefinitely. Instead:
[PARTIAL — Opus synthesis did not complete]Using the Opus synthesis output, generate the full report following references/report-template.md.
Before saving, verify:
Write 3 files to the confirmed save location from Phase 1:
report.md — The full research reportsources.md — Complete source list with quality scores and metadatamethodology.md — Detailed methodology (agent counts, timing, tree structure, failures)If pipeline failed partway, save partial report with [PARTIAL] in the title.
Display in chat:
deep-researcher subagent type — this structurally prevents sub-agent spawning and skill invocation. No exceptions.