Use when conducting research before planning or implementation. Enforces structured investigation (dynamic area coverage from catalog), source triangulation, evidence-over-assumption discipline, and synthesis quality gates that prevent shallow or skipped research.
Research Methodology prevents Claude from skipping research, doing shallow research, or treating research as a checkbox activity. It applies across contexts — from deep multi-agent phase research to lightweight 1-agent focused research. The methodology scales by depth, but the discipline is constant.
Two responsibilities:
RESEARCH BEFORE DECISIONS. EVIDENCE BEFORE ASSUMPTIONS.
Your training data is stale. Your "knowledge" is a guess.
The codebase has patterns you haven't read. The ecosystem has changed.
If you cannot cite a SOURCE for a claim, it is an assumption, not a finding.
Never recommend a technology, pattern, or approach without investigating alternatives.
Never skip research because the answer "seems obvious."
This is non-negotiable. No time pressure, no "I already know this," no user request overrides this.
USER = Decision Maker. Approves scope, reviews findings, makes final choices.
CLAUDE = Researcher. Investigates, synthesizes, presents options with evidence.
NEVER present a single option as "the answer."
NEVER assume user wants what you'd recommend. Present trade-offs.
"I recommend X" is fine. "Here's X" without alternatives is not.
This skill is used by commands at different depth levels. Match depth to context.
| Context | Depth | Areas | Agents | Output |
|---|---|---|---|---|
/st:phase-research | Deep | Dynamic areas from catalog | Parallel per wave + 1 synthesizer | Area files + SUMMARY.md |
/st:init | Deep | Dynamic areas from catalog | Parallel per wave + 1 synthesizer | research/ directory |
/st:brainstorm | Medium | 2 rounds (broad → focused) | AI direct (no subagents) | Inline findings |
/st:plan (optional) | Light | 1 focused area | 1 researcher | 1-2 page inline |
The methodology applies at ALL depths. Even light research must triangulate sources and present alternatives. Deep research just does it more thoroughly.
Research areas are selected dynamically from the catalog (references/research-catalog.md), not hardcoded. Each area has a specific scope to prevent agents from overlapping.
Area selection: Load references/research-catalog.md → evaluate trigger and brownfield conditions → select relevant areas → group into waves by dependency.
Core areas (evaluate for every project): STACK, LANDSCAPE, ARCHITECTURE, PITFALLS. Domain-specific areas (include when relevant): SECURITY, PERFORMANCE, ACCESSIBILITY, DATA, INTEGRATION. Custom areas: AI may propose with justification — always requires user confirmation.
For detailed per-area guidance (search strategies, comparison templates, scope boundaries), see references/research-areas.md.
Phase 1: SCOPE DEFINITION
↓ (gate: research question is specific and bounded)
Phase 2: MULTI-SOURCE INVESTIGATION
↓ (gate: 3+ independent sources consulted per key claim)
Phase 3: EVIDENCE EVALUATION
↓ (gate: findings ranked by evidence quality)
Phase 4: SYNTHESIS & PRESENTATION
Gate: Cannot proceed until the research question is specific enough to be answered. "Research authentication" is too broad. "Compare JWT vs session-based auth for this Express API with these constraints" is specific.
Three source categories. Minimum 2 of 3 required for key claims.
| Category | Sources | Strength |
|---|---|---|
| Web | Official docs, tutorials, issue trackers, benchmarks, blog posts | Current, external |
| Codebase | Existing patterns, conventions, dependencies already in use | Proven in this project |
| Ecosystem | npm/pip/cargo stats, GitHub stars/issues, release cadence, community activity | Adoption signals |
For each finding: record the source, the claim, and the confidence level.
Gate: Cannot proceed with fewer than 3 independent data points for key claims. Single-source claims must be flagged as low confidence.
Gate: Every recommendation must cite at least one strong evidence source. Recommendations with only weak evidence must be flagged.
| Strong evidence | Weak evidence |
|---|---|
| Official documentation (version-matched) | Blog post without benchmarks |
| Benchmarks with methodology | "In my experience..." |
| Existing codebase pattern (you read the code) | Training data memory ("I know that...") |
| Issue tracker with reproduction steps | Stack Overflow answer (may be outdated) |
| Package download stats + release dates | GitHub stars alone |
| Verified working example in codebase | "Should work" without testing |
| Changelog/release notes | Second-hand reports |
Training data is ALWAYS weak evidence. Your training data is a snapshot of the internet at a point in time. Verify externally before presenting as fact.
Source staleness checks:
| Bias | Trap | Antidote |
|---|---|---|
| Confirmation | Searching for evidence that supports your initial preference | Search for evidence AGAINST your top recommendation first |
| Familiarity | Recommending tools/patterns you "know" from training data | Include at least one option you haven't previously recommended |
| Authority | Treating popular opinion as truth ("React is best for...") | Evaluate on project-specific criteria, not general reputation |
| Anchoring | First technology found becomes the default, others compared to it | Evaluate each option independently before comparing |
| Recency | Newest library/version assumed best | Check stability, community size, production readiness |
| Survivorship | Only looking at successful projects using X | Search for failure stories, migration-away-from posts |
When multiple research agents produce output (phase-research, init), the synthesizer must:
Read all outputs completely. Not just summaries or first paragraphs.
Identify agreements. Where all agents converge → high confidence.
Surface conflicts. Where agents disagree → present both sides with evidence.
Cross-validate. If Stack recommends X but Pitfalls warns about X → highlight the tension.
Rank recommendations by evidence strength, not by word count or confidence language.
Build Impact Analysis. Compare research findings against the original approach (from context inputs: PROJECT.md, ROADMAP.md, REQUIREMENTS.md). For each key aspect, categorize as KEEP (compatible), REPLACE (incompatible — state alternative), ADD (needed but not in original), REMOVE (in original but no longer needed). This gives users immediate visibility into what changed, what stayed, and why — not just what went wrong.
Produce SUMMARY.md with THREE clearly separated sections:
## Findings (Reference Material)
[Key findings, evidence, comparisons — informational only]
## Impact Analysis
| # | Aspect | Original Approach | Research Recommends | Change | Reason |
|---|--------|-------------------|---------------------|--------|--------|
| 1 | [aspect] | [from PROJECT/ROADMAP] | [finding] | KEEP/REPLACE/ADD/REMOVE | [why] |
## Decisions Requiring Confirmation
[Each decision that needs user choice before it can be applied to REQUIREMENTS.md or ROADMAP.md]
| # | Decision | Research Recommends | Alternatives | Status |
|---|----------|-------------------|-------------|--------|
| 1 | Project structure | Turborepo monorepo | Single app, Polyrepo | Pending user choice |
| 2 | Database | Supabase PostgreSQL | PlanetScale, Neon | Pending user choice |
The calling command (init, phase-research) is responsible for presenting these decisions to the user. SUMMARY.md only extracts and lists them.
Frame output as findings, not instructions. Follow research-boundaries rules for output framing, language choice (descriptive not prescriptive), and header templates. See core-principles/references/research-boundaries.md for full rules and examples.
Gate: SUMMARY.md must address conflicts. If it mentions none, the synthesizer missed something — there are always trade-offs.
These thoughts mean you are about to violate the methodology:
| Thought | What to do instead |
|---|---|
| "I already know the best approach" | Your training data is a guess. Verify against current sources. |
| "Research isn't needed for this" | The command already decided research is needed. Do it properly. |
| "Let me quickly mention a few options" | Quick = shallow. Follow the protocol. Investigate each option. |
| "X is the industry standard" | Says who? Cite the source. Industry standards change. |
| "Everyone uses X" | Popularity is not evidence of fitness. Check trade-offs for THIS project. |
| "Based on my knowledge..." | Your knowledge is training data. Find a current source. |
| "I'll research this later / in more detail" | Research happens NOW. This IS the research step. |
| "The user probably wants X" | Present options. User decides. Your job is to inform, not assume. |
| "This technology is better because it's newer" | Newer is not better. Compare on actual criteria. |
| "Let me skip this area, it's obvious" | Selected areas are not skippable. Even "obvious" domains have surprises. |
| "This finding is strong enough to be a MUST requirement" | Research findings are suggestions. See core-principles/references/research-boundaries.md. |
| Excuse | Reality |
|---|---|
| "Simple project doesn't need research" | The command determined research is needed. Depth varies, methodology doesn't. |
| "I'm an AI, I already know this domain" | You know what your training data said. The ecosystem may have changed. |
| "User is in a hurry" | Shallow research leads to wrong decisions. 20 min research saves days of rework. |
| "There's really only one good option" | Then research will confirm that quickly. If there's truly one option, proving it takes 5 minutes. |
| "Research is done, just need to write it up" | Writing IS research. Synthesis reveals gaps. If you can't write it clearly, you haven't understood it. |
| "The codebase already uses X, so we should keep using X" | Consistency has value, but verify X is still the right choice. Present the trade-off. |
| "I found a great article that covers everything" | One source is not research. Cross-reference with at least 2 more. |
| "The official docs say to do it this way" | Docs describe one way. Are there alternatives? What are the trade-offs? |
IRON LAW:
RESEARCH BEFORE DECISIONS. EVIDENCE BEFORE ASSUMPTIONS.
Training data is not evidence. Cite sources or flag as assumption.
AREAS (dynamic from catalog):
Select areas based on trigger + brownfield conditions.
Core: STACK, LANDSCAPE, ARCHITECTURE, PITFALLS
Domain: SECURITY, PERFORMANCE, ACCESSIBILITY, DATA, INTEGRATION
Custom: max 2, requires user confirmation, max 8 total
WAVE ORDER (dynamic from dependencies):
wave = max(wave[deps]) + 1
Areas with no deps → Wave 1. Spawn all per wave in one message.
PROTOCOL:
1. Scope (bound the question)
→ gate: specific research question
2. Investigate (3+ sources, 2+ categories)
→ gate: 3+ independent data points per key claim
3. Evaluate (rank evidence, surface conflicts)
→ gate: every recommendation cites strong evidence
4. Synthesize (options, trade-offs, unknowns)
DEPTH:
Deep (phase-research, init): dynamic areas from catalog, parallel agents, full output
Medium (brainstorm): 2 rounds, inline
Light (plan): 1 area, focused, 1-2 pages
SOURCES (min 2 of 3 categories):
Web (docs, benchmarks, issues) | Codebase (patterns, deps) | Ecosystem (stats, releases)
NEVER:
Single option without alternatives | Claims without sources |
Skip areas at Deep depth | Trust training data alone |
Resolve conflicts silently | Skip codebase scan
| Mistake | Fix |
|---|---|
| Recommending a technology without checking alternatives | Always present 2-3 options with trade-offs. Even if one is clearly better, show why. |
| Citing training data as fact ("React 18 introduced...") | Verify via web search or docs. Training data may be wrong or outdated. |
| Skipping codebase scan | Existing patterns are the strongest evidence. The project already uses tools — check them first. |
| Research output is a knowledge dump, not actionable | Every finding must answer: "So what? What should the user DO with this?" |
| Resolving conflicts between sources silently | Surface conflicts explicitly. User decides. Research informs, doesn't decide. |
| All recommendations are the same technology/approach | Check for familiarity bias. Force yourself to evaluate at least one unfamiliar option. |
| Research is broad but shallow (mentions many things, investigates none) | Better to deeply investigate 3 options than shallowly mention 10. |
| Pitfalls section is generic ("watch out for performance") | Pitfalls must be specific to THIS stack, THIS architecture, THIS domain. |
| Landscape section only lists competitors without analysis | Compare on criteria relevant to the project, not just list names. |
| SUMMARY.md has no conflicts section | There are ALWAYS trade-offs. No conflicts = missed something. |
| Research uses MUST/SHOULD as if setting requirements | See core-principles/references/research-boundaries.md for output language rules |
This section defines the complete orchestration flow for deep research (used by /st:init and /st:phase-research). Commands delegate to this flow instead of implementing their own.
The orchestration accepts parameters from the calling command:
context_inputs: files to read for research context
PROJECT.md, CONTEXT.md, ROADMAP.md, REQUIREMENTS.md, ARCHITECTURE.mdoutput_dir: where to save research results
.superteam/research/, .superteam/phases/[name]/research/research_context: label for the research session
"init", "phase 3: Authentication", "Q2 Stack Review"commit_message: format for the final commit
"research: {research_context} — {areas}"context_inputs files: extract domain, tech decisions, constraints, greenfield/brownfield statusreferences/research-catalog.md)needs fieldswave = max(wave[deps]) + 1RESEARCH-PLAN.md to output_dir before presenting to user:# Research Plan
Created: [date]
Context: [research_context]
Status: planning
## Selected Areas
| Area | Focus | Wave | Status |
|------|-------|------|--------|
| STACK | [focus] | 1 | pending |
| ARCHITECTURE | [focus] | 2 | pending |
...
## Wave Structure
Wave 1: [areas] → Wave 2: [areas] → ...
## Decisions
- [area] included because: [reason]
- [area] skipped because: [reason]
RESEARCH PLAN — [research_context]
Wave [N] (parallel, [M] agents):
├─ [AREA]: [focus description]
└─ [AREA]: [focus description]
...
Total: [X] agents, [Y] waves
Saved: [output_dir]/RESEARCH-PLAN.md
Adjust areas or proceed?
config.research_auto_approve is true: display plan and proceed immediately
(EXCEPT: if custom areas proposed, always pause for confirmation)RESEARCH-PLAN.md status to in-progressrun_in_background: true. All agents MUST run in foreground so the orchestrator can read outputs immediately.context_inputs, relevant prior wave outputs, specific focus areaRESEARCH-PLAN.md — set completed areas' status to doneSUMMARY.md to output_dir (with both "Findings" and "Decisions Requiring Confirmation" sections)RESEARCH SUMMARY — [research_context]
Areas researched: [list of areas]
Key findings (reference material — auto-saved):
1. [finding]
2. [finding]
Decisions requiring confirmation (NOT yet applied):
1. [decision] — [recommended option] vs [alternatives]
2. [decision] — [recommended option] vs [alternatives]
Conflicts found:
[if any: describe + suggest resolution]
Present findings as reference material. Decisions are listed but NOT confirmed here — the calling command (init step 5.5, phase-research) is responsible for presenting each decision individually for user choice.
Wait for user review, answer follow-up questions if needed.
output_dir during executionRESEARCH-PLAN.md status to completedsuperteam:atomic-commitscommit_message format provided by the calling commandIf RESEARCH-PLAN.md exists with status in-progress:
done vs pendingThis makes research resilient to session interruptions — the plan on disk is the source of truth.
| File | When to Load | Trigger |
|---|---|---|
SKILL.md | Always | Skill invocation (via init, phase-research, brainstorm, or plan) |
references/research-catalog.md | When planning research | Area selection: triggers, dependencies, brownfield, guardrails |
references/research-areas.md | On demand | Deep research execution guidance: search strategies, templates, scope |
Rule: Light research (/st:plan) resolves with SKILL.md alone. Deep research (/st:phase-research, /st:init) loads references/research-catalog.md for area selection and references/research-areas.md for execution guidance.
Used by:
/st:init — dynamic-wave research (areas selected from catalog, grouped by dependencies)/st:phase-research — dynamic research agents from catalog + synthesizer/st:brainstorm — 2-round inline research (broad → focused)/st:plan — optional focused research when AI recommends itSkills that pair with research-methodology:
superteam:project-awareness — provides framework detection for codebase-aware researchsuperteam:wave-parallelism — parallel research agents follow wave protocol (dynamic waves from catalog dependencies)superteam:verification — research findings verified before feeding into planssuperteam:handoff-protocol — research state (sources, findings, conflicts) captured on pauseAgents:
phase-researcher — spawned by phase-research and init. Each instance covers one research area. Follows this skill's methodology.research-synthesizer — spawned after all researchers complete. Follows synthesis protocol from this skill.IRON LAW:
RESEARCH BEFORE DECISIONS. EVIDENCE BEFORE ASSUMPTIONS.
Training data is not evidence. Cite sources or flag as assumption.
AREAS (dynamic from catalog):
Select areas based on trigger + brownfield conditions.
Core: STACK, LANDSCAPE, ARCHITECTURE, PITFALLS
Domain: SECURITY, PERFORMANCE, ACCESSIBILITY, DATA, INTEGRATION
Custom: max 2, requires user confirmation, max 8 total
WAVE ORDER (dynamic from dependencies):
wave = max(wave[deps]) + 1
Areas with no deps → Wave 1. Spawn all per wave in one message.
PROTOCOL:
1. Scope (bound the question)
→ gate: specific research question
2. Investigate (3+ sources, 2+ categories)
→ gate: 3+ independent data points per key claim
3. Evaluate (rank evidence, surface conflicts)
→ gate: every recommendation cites strong evidence
4. Synthesize (options, trade-offs, unknowns)
DEPTH:
Deep (phase-research, init): dynamic areas from catalog, parallel agents, full output
Medium (brainstorm): 2 rounds, inline
Light (plan): 1 area, focused, 1-2 pages
SOURCES (min 2 of 3 categories):
Web (docs, benchmarks, issues) | Codebase (patterns, deps) | Ecosystem (stats, releases)
NEVER:
Single option without alternatives | Claims without sources |
Skip areas at Deep depth | Trust training data alone |
Resolve conflicts silently | Skip codebase scan
| Mistake | Fix |
|---|---|
| Recommending a technology without checking alternatives | Always present 2-3 options with trade-offs. Even if one is clearly better, show why. |
| Citing training data as fact ("React 18 introduced...") | Verify via web search or docs. Training data may be wrong or outdated. |
| Skipping codebase scan | Existing patterns are the strongest evidence. The project already uses tools — check them first. |
| Research output is a knowledge dump, not actionable | Every finding must answer: "So what? What should the user DO with this?" |
| Resolving conflicts between sources silently | Surface conflicts explicitly. User decides. Research informs, doesn't decide. |
| All recommendations are the same technology/approach | Check for familiarity bias. Force yourself to evaluate at least one unfamiliar option. |
| Research is broad but shallow (mentions many things, investigates none) | Better to deeply investigate 3 options than shallowly mention 10. |
| Pitfalls section is generic ("watch out for performance") | Pitfalls must be specific to THIS stack, THIS architecture, THIS domain. |
| Landscape section only lists competitors without analysis | Compare on criteria relevant to the project, not just list names. |
| SUMMARY.md has no conflicts section | There are ALWAYS trade-offs. No conflicts = missed something. |
| Research uses MUST/SHOULD as if setting requirements | See core-principles/references/research-boundaries.md for output language rules |
This section defines the complete orchestration flow for deep research (used by /st:init and /st:phase-research). Commands delegate to this flow instead of implementing their own.
The orchestration accepts parameters from the calling command:
context_inputs: files to read for research context
PROJECT.md, CONTEXT.md, ROADMAP.md, REQUIREMENTS.md, ARCHITECTURE.mdoutput_dir: where to save research results
.superteam/research/, .superteam/phases/[name]/research/research_context: label for the research session
"init", "phase 3: Authentication", "Q2 Stack Review"commit_message: format for the final commit
"research: {research_context} — {areas}"context_inputs files: extract domain, tech decisions, constraints, greenfield/brownfield statusreferences/research-catalog.md)needs fieldswave = max(wave[deps]) + 1RESEARCH-PLAN.md to output_dir before presenting to user:# Research Plan
Created: [date]
Context: [research_context]
Status: planning
## Selected Areas
| Area | Focus | Wave | Status |
|------|-------|------|--------|
| STACK | [focus] | 1 | pending |
| ARCHITECTURE | [focus] | 2 | pending |
...
## Wave Structure
Wave 1: [areas] → Wave 2: [areas] → ...
## Decisions
- [area] included because: [reason]
- [area] skipped because: [reason]
RESEARCH PLAN — [research_context]
Wave [N] (parallel, [M] agents):
├─ [AREA]: [focus description]
└─ [AREA]: [focus description]
...
Total: [X] agents, [Y] waves
Saved: [output_dir]/RESEARCH-PLAN.md
Adjust areas or proceed?
config.research_auto_approve is true: display plan and proceed immediately
(EXCEPT: if custom areas proposed, always pause for confirmation)RESEARCH-PLAN.md status to in-progressrun_in_background: true. All agents MUST run in foreground so the orchestrator can read outputs immediately.context_inputs, relevant prior wave outputs, specific focus areaRESEARCH-PLAN.md — set completed areas' status to doneSUMMARY.md to output_dir (with both "Findings" and "Decisions Requiring Confirmation" sections)RESEARCH SUMMARY — [research_context]
Areas researched: [list of areas]
Key findings (reference material — auto-saved):
1. [finding]
2. [finding]
Decisions requiring confirmation (NOT yet applied):
1. [decision] — [recommended option] vs [alternatives]
2. [decision] — [recommended option] vs [alternatives]
Conflicts found:
[if any: describe + suggest resolution]
Present findings as reference material. Decisions are listed but NOT confirmed here — the calling command (init step 5.5, phase-research) is responsible for presenting each decision individually for user choice.
Wait for user review, answer follow-up questions if needed.
output_dir during executionRESEARCH-PLAN.md status to completedsuperteam:atomic-commitscommit_message format provided by the calling commandIf RESEARCH-PLAN.md exists with status in-progress:
done vs pendingThis makes research resilient to session interruptions — the plan on disk is the source of truth.
| File | When to Load | Trigger |
|---|---|---|
SKILL.md | Always | Skill invocation (via init, phase-research, brainstorm, or plan) |
references/research-catalog.md | When planning research | Area selection: triggers, dependencies, brownfield, guardrails |
references/research-areas.md | On demand | Deep research execution guidance: search strategies, templates, scope |
Rule: Light research (/st:plan) resolves with SKILL.md alone. Deep research (/st:phase-research, /st:init) loads references/research-catalog.md for area selection and references/research-areas.md for execution guidance.
Used by:
/st:init — dynamic-wave research (areas selected from catalog, grouped by dependencies)/st:phase-research — dynamic research agents from catalog + synthesizer/st:brainstorm — 2-round inline research (broad → focused)/st:plan — optional focused research when AI recommends itSkills that pair with research-methodology:
superteam:project-awareness — provides framework detection for codebase-aware researchsuperteam:wave-parallelism — parallel research agents follow wave protocol (dynamic waves from catalog dependencies)superteam:verification — research findings verified before feeding into planssuperteam:handoff-protocol — research state (sources, findings, conflicts) captured on pauseAgents:
phase-researcher — spawned by phase-research and init. Each instance covers one research area. Follows this skill's methodology.research-synthesizer — spawned after all researchers complete. Follows synthesis protocol from this skill.