$34
Systematic methodology for gathering, validating, and synthesizing external knowledge using Chain of Knowledge (CoK) graph-based expansion until saturation, executed via the Δ1-Δ7 web search protocol with domain-specific source tiering.
This skill runs in two distinct environments with different activation mechanics:
/deep-research is a first-class slash command with argument parsing.
/deep-research swift concurrency
/deep-research kubernetes cost optimization --file infra.md
/deep-research --file existing-playbook.md (re-researches and updates)
argument-hint, user-invocable, and $ARGUMENTS only work here.
All other environments ignore these frontmatter fields.
Argument Parsing — parse $ARGUMENTS before doing anything:
$ARGUMENTS contains --file <path> — extract as output path.
Everything else is the topic..md or contains / — treat as output path.
Everything else is the topic.$ARGUMENTS as the topic.
Auto-generate path: in the current working directory.<topic-slug>.md$ARGUMENTS is empty — show help and stop:Usage:
/deep-research swift concurrency
/deep-research kubernetes cost optimization --file infra.md
/deep-research --file existing-playbook.md (re-researches and updates)
Output mode:
mode = file (write playbook, see File Output section)mode = inline (answer in conversation, standard Δ1-Δ7)In environments without native skill support, this skill is delivered via an
MCP bridge such as SkillPort (skillport-mcp) or Skillz.
There is no /deep-research slash command and no $ARGUMENTS.
How it works:
search_skills + load_skill tools.
The agent searches for "deep research" → loads this skill on demand.
Progressive disclosure: metadata only until load_skill is called.Zed setup (SkillPort):
{
"context_servers": {
"skillport": {
"source": "custom",
"command": "uvx",
"args": ["skillport-mcp"],
"env": {
"SKILLPORT_SKILLS_DIR": "/path/to/your/skills"
}
}
}
}
Zed setup (Skillz, simpler):
{
"context_servers": {
"skillz": {
"source": "custom",
"command": "uvx",
"args": ["skillz@latest", "/path/to/your/skills"]
}
}
}
In MCP mode, topic and output mode are inferred from the conversation.
The agent reads the full skill body and applies the Disambiguation Step,
Δ1-Δ7 protocol, and File Output section based on what the user asked for —
no argument parsing required. $ARGUMENTS blocks are simply skipped.
(Applies in all environments. In Claude Code, runs after argument parsing. In MCP environments, runs based on conversation context.)
Before executing, use AskUserQuestion if ANY of these apply:
mode = file and file already exists — confirm update or overwrite.Skip if the topic is specific and unambiguous.
| Signal | Condition | Action |
|---|---|---|
| SATURATED | All core requirements covered, or depth ≥ max, or relevance < 0.3, or budget exhausted | ✓ STOP — synthesize and present findings |
| NO_TOOLS | Zero search/fetch tools available after Δ1 scan | Degrade — use training knowledge with explicit disclaimer |
| EMERGENCY | External verification becomes impossible mid-protocol | Mark uncertainty — never present as authoritative |
This skill adapts to available tooling rather than halting:
The skill always produces output. The confidence level varies.
⚠ CRITICAL: Some domains carry life-affecting consequences.
A wrong answer in medicine can cause death. A wrong answer in psychology can contribute to suicide. A wrong answer in legal advice can result in imprisonment. A wrong answer in pharmacology can cause poisoning. A wrong answer in structural engineering can cause building collapse.
High-stakes domains (deep research ALWAYS required): Medical, Psychology, Pharmacology, Legal (advisory), Structural/Civil engineering, Nutrition (medical), Childcare/Parenting, Financial (advisory).
Detection heuristic: If the answer could plausibly influence a decision that affects someone's physical health, mental health, legal standing, financial security, or physical safety — treat as high-stakes. When uncertain, escalate.
Mandatory protocol when high-stakes detected:
Deep research is MANDATORY, not optional. Use the strongest available research tool (deep research agents, comprehensive multi-query search). If no deep research tool exists, compensate with 5-8 targeted searches constrained to T1 sources (peer-reviewed journals, clinical guidelines, regulatory text, official standards).
Forward-consequence CoK is MANDATORY.
Standard CoK fills knowledge gaps: (subject, relation, ?).
High-stakes CoK ALSO fills consequence and contraindication gaps:
(recommendation, interacts_with, ?) — what conflicts exist?(advice, contraindicated_for, ?) — who should NOT follow this?(solution, assuming, ?) — what must be true for this to hold?(approach, if_wrong, ?) — what happens if this is incorrect?(recommendation, superseded_by, ?) — has this been updated?(treatment, withdrawn_in, ?) — any regulatory actions?T1 sources ONLY for high-stakes claims. T2-T4 sources may inform research direction but NEVER override T1 for life-affecting recommendations.
Safety disclaimers ALWAYS in output. "Consult a qualified [professional] before acting on this information." Mark confidence level explicitly. Expose contradictions between sources — NEVER silently resolve them for high-stakes domains.
CoK depth minimum L0-L4. High-stakes domains require full expansion depth. Do not stop at L2 even if surface requirements appear covered — the forward-consequence fills often reveal critical gaps at L3-L4.
See @references/domain-knowledge-matrix.md for domain-specific high-stakes protocol details and forward-fill patterns.
CoK is systematic knowledge expansion
using graph-based reasoning until saturation.
Build linked triples (subject, relation, object),
identify gaps, and fill them via targeted search.
Known triples:
(Next.js, supports, SSR)
(SSR, improves, SEO)
Gaps (? = unknown → each becomes a search query):
(SSR, requires, ?) → search: server configuration
(SEO, measured_by, ?) → search: metrics, tools
(Next.js, competes_with, ?) → search: alternatives
Action: Fill each ? via targeted search → expand graph → repeat
Standard forward-fill discovers what IS. Forward-consequence fill discovers what COULD GO WRONG. This pattern is MANDATORY for high-stakes domains and recommended for any domain where consequences matter.
Standard fill:
(medication X, treats, condition Y)
(condition Y, symptoms, ?) → fill: what symptoms
Forward-consequence fill:
(medication X, interacts_with, ?) → fill: drug interactions
(medication X, contraindicated_for, ?) → fill: who should NOT take this
(medication X, superseded_by, ?) → fill: newer alternatives
(medication X, withdrawn_in, ?) → fill: regulatory actions
(treatment, assuming, ?) → fill: what must be true
(advice, if_wrong, ?) → fill: consequences of error
This pattern generalizes across all domains:
Any domain:
(recommendation, interacts_with, ?) → conflicts and side effects
(approach, contraindicated_for, ?) → who/what should NOT use this
(solution, assuming, ?) → hidden preconditions
(advice, superseded_by, ?) → newer knowledge
(method, fails_when, ?) → failure conditions
L0: Initial topic → direct triples (relevance 1.0)
L1: First expansion → related concepts (relevance ~0.7)
L2: Second expansion → supporting details (relevance ~0.5)
L3: Third expansion → peripheral context (relevance ~0.3)
L4: Predicted relevance < 0.3 → STOP expanding
Stop when ANY is true:
- All core requirements covered
- Depth ≥ max (3-5 levels depending on budget)
- Relevance drops below 0.3
- Token budget exhausted
- Circular references detected (triples repeat)
| Tier | Description | Default Confidence | Weight in Conflicts |
|---|---|---|---|
| T1 | Peer-reviewed / official docs / RFCs | HIGH | Strongest |
| T2 | Expert blogs / established sources | MED | Strong |
| T3 | Community forums / Stack Overflow | LOW | Weak |
| T4 | Opinions / unverified claims | LOW | Weakest |
When sources conflict, higher tier + more recent = stronger evidence. Annotate every cited source with its tier.
Seven steps from tool discovery to validated output. Each step builds on the previous. Skip none.
Scan available tools and identify search/fetch capabilities. If zero external tools are available, switch to degraded mode (training knowledge with explicit disclaimer — see Graceful Degradation).
Get the current date (mandatory — NO hardcoded years):
current_date = now(timezone="local")
current_year = extract year from current_date
Check for high-stakes domain (see High-Stakes Domain Escalation above). If high-stakes detected:
Plan 3+ searches covering different angles (5-8 for high-stakes domains):
Select tools per domain. See @references/domain-knowledge-matrix.md for domain-specific tool selection, query patterns, and CoK depth guidance.
For each planned search:
execute with temporal qualifiers (use {current_year}, never hardcode),
record each finding with source URL + date + tier (T1-T4).
SOURCES: [Source]:[Finding](T#) — at least 3
CONSENSUS: [What sources agree on]
CONTRADICTIONS: [Where sources disagree and why]
GAPS: [What remains unclear]
Resolve conflicts using source tier priority. Higher tier + more recent = stronger evidence.
Based on [N] sources (searched: {current_date}):
[ANSWER]
Evidence:
- [Claim] (Source: [cite], T#)
- [Claim] (Source: [cite], T#)
[IF contested] ⚠ Sources vary. Consult [authority].
[IF degraded] ⚠ Training knowledge only. No external verification.
Before presenting results, verify:
If any item fails, fix before presenting.
For comprehensive research requiring broad coverage (20-50+ references). Use when: novel/niche topic, systematic review needed, user says "deep dive," or HIGH-STAKES domain where deep research is mandatory. For well-known topics or single questions, standard Δ1-Δ7 is sufficient.
1. Broad sweep: 20-50 references via multiple search tools
2. Filter: top 10 candidates by relevance and tier
3. Summarize: extract key points from each candidate
4. Select: top 3-5 highest-quality sources
5. Extract: full content from each selected source
6. Synthesize: build CoK triples → produce comprehensive cited answer
Use whatever search, summarize, and scrape tools are available. The pipeline adapts to available tooling — if only one search tool exists, use it for the broad sweep and skip the multi-tool parallelism.
⚠ MANDATORY: When research covers 2+ independent subjects and a sub-agent mechanism is available (see orchestration rules), dispatch each subject to a dedicated sub-agent rather than researching sequentially in the master context.
Why — Context Engineering:
Decision gate:
Research request received →
How many independent subjects?
1 subject → run Δ1-Δ7 inline (standard)
2+ subjects, NO dependencies between them →
sub-agent available? → YES → fan-out (one agent per subject or group)
→ NO → sequential with context fencing
2+ subjects, WITH dependencies →
sequential chain (output of A feeds B)
Dispatch pattern:
FOR EACH independent subject (or logical group of 2-3 related subjects):
spawn_agent(
label = "[subject-name] research"
message = "Research [subject]. Report on: [dimensions].
Use web search and scraping tools.
Cite sources with tiers (T1-T4).
Max [N] words per section.
Return structured findings only."
)
→ collect all reports
→ master synthesizes: compare, rank, identify gaps, produce unified answer
Grouping heuristic:
| Subject count | Strategy |
|---|---|
| 1 | Inline Δ1-Δ7 in master context |
| 2-3 | One sub-agent per subject |
| 4-6 | Group related subjects (2-3 per agent) |
| 7+ | Group into 3-5 agents by affinity; set per-section word limits |
Output budget per sub-agent: Target ~3,000 words per agent report. When a single subject is expected to produce >3,000 words, constrain with explicit section word limits in the prompt (e.g., "max 500 words per section"). When grouping 2-3 subjects per agent, the natural constraint of covering multiple subjects produces balanced output without explicit limits.
Token savings estimate:
| Approach | Master context consumed |
|---|---|
| Sequential inline (all subjects) | Raw HTML + search results + synthesis = 80-150K tokens |
| Sub-agent dispatch | Only synthesized reports = 10-25K tokens |
| Savings | 70-85% reduction in master context usage |
Integration with Δ1-Δ7: Each sub-agent independently executes the full Δ1-Δ7 protocol for its assigned subject(s). The master does NOT re-run Δ1-Δ7 on the same subjects. The master's role is synthesis, comparison, and gap identification.
When using asynchronous deep research tools
(e.g., deep_researcher_start → deep_researcher_check),
the research runs in the background and must be polled for results.
⚠ MANDATORY: Wait minimum 30 seconds between status checks.
∆1: Start deep research with clear, specific instructions
→ receives research_id immediately
∆2: Wait AT LEAST 30 seconds before first status check
→ do NOT poll immediately — the research needs time to run
→ use this wait time productively (run other searches in parallel)
∆3: Check status with research_id
→ status: "processing" → wait another 30+ seconds, check again
→ status: "completed" → extract results
→ status: "failed" → fall back to multi-query manual search
∆4: Keep polling with 30+ second intervals until completed or failed
→ typical completion times:
fast model: 15-30 seconds
balanced: 30-60 seconds
pro/deep: 60-180 seconds (be patient!)
→ do NOT give up after 1-2 checks — research takes time
→ maximum patience: 3-5 minutes for complex queries
∆5: On completion → extract findings, integrate into CoK graph
Why 30 seconds minimum: Polling too frequently wastes API calls without accelerating results. Deep research agents need time to search, read, and synthesize. Premature polling produces "still processing" responses that consume tokens without providing value.
Parallel work during wait: While waiting for deep research results, run standard Δ3 searches in parallel. The deep research results will supplement — not replace — standard search findings. Both contribute to saturation.
Model selection guidance:
| Research need | Model to use | Expected wait |
|---|---|---|
| Simple factual lookup | fast (~15s) | 15-30 seconds |
| Multi-source comparison | balanced (~30-45s) | 30-60 seconds |
| HIGH-STAKES / comprehensive | pro/deep (~60-180s) | 1-3 minutes |
| Novel/niche topic | pro/deep | 1-3 minutes |
For HIGH-STAKES domains, ALWAYS use the deepest/pro model. The extra wait time is justified by the quality of evidence gathered.
llms*.txt in current directory (domain patterns)Check local sources before reaching for the web. For LLM pattern file handling and domain-specific tool selection, see @references/domain-knowledge-matrix.md.
User: "What's the best state management for React?"
Δ1: Available tools: web search ✓, scrape ✓
Δ2: Strategy (date from now()):
- S1: "React state management {current_year} comparison"
- S2: "React state management production usage"
- S3: "Redux vs Zustand vs Jotai benchmark {current_year}"
Δ3: Execute → 6 sources gathered (T1: 2, T2: 3, T3: 1)
Δ4: Consensus: Zustand growing rapidly, Redux dominant in enterprise
Contradictions: T3 "Redux is dead" vs T1 official adoption data
Δ5: Weight: T1 > T3 → Redux dominant, Zustand rising fastest
Δ6: Output with 6 cited sources, tiers annotated
Δ7: Validation checks pass ✓
When a file path is resolved from $ARGUMENTS, write findings to a persistent
markdown playbook instead of answering inline.
# [Title derived from topic]
## [Section Name]
[Tight synthesis paragraph + bullet insights]
[Claim]. — [Source Name](URL)
---
## Sources
- [URL 1]
- [URL 2]
---
_Captured: YYYY-MM-DD_
Rules:
*Captured:* date.[Insight]. — [Source Name](URL)*Captured:* date.Rules:
After writing, read the file and verify:
Then report:
## Playbook [Created | Updated]
File: [absolute path]
Mode: [create | update]
Topic: [what was researched]
Sections [created | modified]: N
Sources: N
Findings integrated: N
Findings dropped (no source / duplicate): N
✗ Hardcoded years in search queries ("React 2024 state management")
✓ Dynamic year from now() ("React {current_year} state management")
✗ Single search, single source → presenting as authoritative
✓ 3+ searches, 3+ sources with tier annotations
✗ Skipping source tiers → treating blog post same as official docs
✓ T1-T4 tier annotation on every cited source
✗ Ignoring contradictions between sources
✓ Explicitly exposing contradictions with tier-weighted resolution
✗ Halting because no search tools are available
✓ Degrade gracefully: use training knowledge with explicit disclaimer
✗ Deep research pipeline for simple factual lookups
✓ Deep research only when comprehensive coverage needed
✗ Reading entire llms*.txt file into context
✓ Index sections first → extract targeted section only
✗ Presenting training knowledge as current fact without verification
✓ Search first, cite sources, mark uncertainty when present
✗ Polling deep research status every 5 seconds (wastes API calls, no faster results)
✓ Wait minimum 30 seconds between deep research status checks
✗ Giving up on deep research after 1-2 polling attempts
✓ Be patient — pro/deep research can take 1-3 minutes. Keep polling at 30s intervals
✗ Blocking on deep research instead of doing parallel work
✓ Run standard Δ3 searches while waiting for deep research results
✗ Standard-depth search for medical, psychology, or legal questions
✓ HIGH-STAKES: deep research mandatory, T1 only, forward-consequence CoK,
safety disclaimers — a wrong answer can cause real-world harm
✗ Skipping forward-consequence CoK for high-stakes domains
✓ Always ask: what goes wrong if this advice is followed incorrectly?
Who should NOT follow this? Has this been superseded?
✗ Silently resolving contradictions in high-stakes domains
✓ EXPOSE all contradictions — let the human decide, with professional guidance
✗ Treating all domains equally — culinary ≠ medical
✓ Detect domain stakes early, escalate research depth accordingly
✗ Researching 3+ independent subjects sequentially in master context
✓ Fan-out to sub-agents: one per subject, each runs Δ1-Δ7 independently
✗ Loading raw scraped HTML from 5+ subjects into master context
✓ Sub-agents consume raw content; master receives only synthesized reports
✗ No word limits on sub-agent prompts → unbounded output floods master
✓ Set explicit per-section word limits (~500 words) or per-agent budget (~3K words)
✗ Grouping dependent subjects into parallel agents
✓ Only independent subjects parallelize; dependent chains go sequential
This skill adapts to available tooling: