Deep evidence-first research with broad discovery, verification, and traceable citations. Prefer invoking via research-workflow. TRIGGER when (MANDATORY — you MUST invoke this skill, no exceptions): user message contains ANY of these keywords or synonyms — 调研/研究/对比/综述/文献/证据/机制/根因/为什么/可行性/路线图/分析/探索, or research/investigate/compare/survey/literature/evidence/mechanism/root-cause/why/feasibility/roadmap/analyze/explore — or asks to verify claims, analyze tradeoffs, scope a new topic, or conduct literature review. Also use this skill as the default gateway for external search. Skipping when keywords match is a routing violation. DO NOT TRIGGER when: user asks for paper-writing output (use paper-writing), experiment launch (use experiment-execution), or plan-only without evidence (use research-plan).
Produce a deeply researched, evidence-grounded answer with clear provenance and actionable conclusions, and act as the default gateway for external search in research runs.
All external search during non-trivial research runs must enter through deep-research.
Rules:
deep-research with ad hoc direct search when fresh outside evidence is neededdeep-research may choose a lighter or deeper execution depth internally, but it may not silently skip actual searchdeep-research run must perform real WebSearch calls and keep an auditable query traildr_skip_reason with explicit date windows and source countsChoose a primary research_type early:
idea-explorationdebug-investigationdesign-decisionimplementation-strategyconflict-resolutionIf templates do not fit exactly, adapt structure freely but keep depth, verification, and citations.
Before selecting depth or running any WebSearch queries:
intake_checkpoint_complete=YEShuman-checkpointmoderate or detailed, prefer built-in user-question tool (request_user_input)Every deep-research run must begin with a frontier-first scout before final depth selection.
Scout requirements:
bleeding-edge topic queriesfrontier topic queriesScout rules:
deep or downgrading to lightIterate until evidence quality is sufficient:
As of: YYYY-MM-DDWhen the topic has implementation, benchmark, reproduction, or planning implications, also apply references/codebase-and-data-research-rules.md.
When called during an ongoing run:
dr_skip_reason with explicit date windows and source countsWhen deep research is used for open-ended scoping (idea-exploration), hand off findings to research-plan as the required default next step. Skip only if the user explicitly opts out.
Handoff expectations:
Do not output final conclusions until all gate checks pass.
Before synthesis, print:
intake_checkpoint_complete=YES|NOintake_channel=request_user_input|plain-text-fallback|nonesearch_entry=deep-researchfrontier_first_scout=YES|NOselected_depth=light|default-auditable|deepdepth_reason=dr_degrade_reason=total_queries=scout_queries=bleeding_edge_queries=frontier_queries=recent_queries=mid_term_queries=classic_queries=degrade_used=YES|NOgate_pass=YES|NOIf degrade_used=YES, also print:
degrade_from=degrade_to=degrade_gap=degrade_queries_run=degrade_reason=If gate_pass=NO, continue searching and do not finalize.
Support three execution depths:
light
bleeding-edge >= 3, frontier >= 3, recent >= 2, mid-term >= 1, classic >= 1default-auditable
bleeding-edge >= 12, frontier >= 10, recent >= 10, mid-term >= 8, classic >= 6deep
bleeding-edge >= 28, frontier >= 22, recent >= 20, mid-term >= 16, classic >= 10Selection rules:
default-auditabledeep when scope is broad, open-ended, contradiction-heavy, or asks for landscape plus recipe or mechanism analysislight is not a default modelight is allowed only when scout confirms the task is narrow, directly verifiable, and low-ambiguitylightlight unless the user explicitly forces itPrint this mini-check immediately after selecting depth:
depth_candidate=light_disqualifiers_hit=open_ended_exploration=YES|NOpaper_centric=YES|NOdepth_sanity_pass=YES|NORules:
light_disqualifiers_hit is non-empty and depth_candidate=light, set depth_sanity_pass=NO and reselect before more searchopen_ended_exploration=YES and user did not explicitly force light, do not use lightpaper_centric=YES and the user asks for mechanisms/recipes/comparisons, do not use lightafter:/before: behavior for final stage assignmentdays_from_as_of for each source and map to exactly one stage using the stage boundary rules belowbleeding-edge, then frontier, then recent whenever the user cares about the latest or fastest-moving evidenceUse five mandatory evidence stages and record source counts for each.
Define days_from_as_of = as_of_date - published_date (integer days). Stages are mutually exclusive:
bleeding-edge (0-90 days): 0 <= days_from_as_of <= 90frontier (91-180 days): 91 <= days_from_as_of <= 180recent (181-365 days): 181 <= days_from_as_of <= 365mid-term (366-730 days): 366 <= days_from_as_of <= 730classic (>730 days): days_from_as_of > 730Freshness floor:
bleeding-edge + frontier >= 35% for normal runsbleeding-edge + frontier + recent >= 60% for all finalized runsPer stage, run at least these query families:
Use dynamic query-family expansion:
Minimum rounds by depth:
light: bleeding-edge/frontier/recent >= 1, mid-term/classic >= 1default-auditable: bleeding-edge/frontier/recent >= 3, mid-term >= 2, classic >= 1deep: bleeding-edge/frontier/recent >= 4, mid-term >= 3, classic >= 2If a stage minimum is not met, allow controlled degradation only after an exhaustion pass.
Exhaustion pass minimums per deficit stage:
light: at least 6 additional stage-targeted queriesdefault-auditable: at least 18 additional stage-targeted queriesdeep: at least 32 additional stage-targeted queriesDegrade rules:
bleeding-edge -> frontier, frontier -> recent, recent -> mid-termbleeding-edge + frontier >= 30% and bleeding-edge + frontier + recent >= 60%insight/procedure memory when it can reduce redundant search or contradiction costdefault-auditable was not neededAlways include:
Type-specific emphasis:
debug-investigation
design-decision
implementation-strategy
conflict-resolution
idea-exploration
Trigger this policy when user asks for any of:
When triggered, include a dedicated Key Works Deep Dive section and meet minimum coverage:
light: 3-5 key worksdefault-auditable: 6-10 key worksdeep: 10-15 key worksFor each key work, provide:
[[S#]](#ref-s#)Finalize only when:
Depth Sanity Check<codex-cwd>/logs/runs/<run_id>/reports/deep-research-<slug>.md