Deep research agent skill. Use when the user needs thorough, scientific, truth-seeking research on any topic -- investigating claims, finding primary sources, synthesizing evidence, producing cited reports. Triggers on: 'research this', 'investigate', 'deep dive', 'find sources', 'what does the evidence say', 'literature review', 'fact check', 'analyze the research on', any request requiring multi-source investigation with citations.
Systematic, evidence-based research that produces cited, source-backed reports. Uses the filesystem as working memory -- scraped content, extracted notes, and source metadata are saved to disk rather than held in context. This keeps the context window lean and makes research resumable, searchable, and reusable.
research/{topic-slug}/
plan.md # Research plan with sub-questions
sources-index.md # URL registry: what's been scraped, metadata
sources/ # Raw scraped content (one file per source)
001-source-slug.md
002-source-slug.md
...
notes/ # Extracted findings per sub-question
question-1.md
question-2.md
...
report.md # Final compiled report
Core principle: Write to disk aggressively, read back selectively. Never hold raw scraped content in context longer than it takes to extract findings. The LLM context should contain only the current working set -- not the entire research corpus.
Estimate: breadth=3, depth=2 -> ~12-20 web searches, ~8-15 pages read, 3-8 minute runtime.
For quick fact-checks: breadth=2, depth=1. For exhaustive reviews: breadth=5, depth=3.
Create the research directory:
mkdir -p research/{topic-slug}/sources research/{topic-slug}/notes
Search local filesystem first. Before any web search, check if relevant content already exists:
research/ directories in the working directorygrep, glob, or semantic search (lss) as appropriate for the taskInitialize sources-index.md:
# Sources Index
| # | URL | Title | Date | Type | Credibility | File |
|---|-----|-------|------|------|-------------|------|
Analyze the query. Identify:
Decompose into 2-5 independently researchable sub-questions.
Write plan.md with the sub-questions, search strategy, and scope decisions.
Create a todo list tracking each sub-question.
For each sub-question, execute this loop:
Generate breadth distinct queries. Vary:
site:scholar.google.com, government site:gov, news, industry)Batch search with search_depth: "advanced":
web_search("query1 ||| query2 ||| query3", search_depth="advanced")
For each promising search result:
Check sources-index.md first. If the URL is already scraped, skip it. Do not re-scrape pages already processed in this session.
Scrape the page. Batch URLs for efficiency:
scrape_webpage("url1, url2, url3")
Save raw content to disk immediately:
# Write scraped content to sources/NNN-slug.md
Include a header with the URL, title, and scrape date. This gets the raw content out of your context.
Extract key findings from the scraped content and append to notes/question-N.md:
Update sources-index.md with the new source entry:
| 003 | https://example.com/article | Article Title | 2025-06-15 | news | medium | sources/003-article-title.md |
Free your context. After extracting findings and saving to disk, you do not need to retain the raw scraped content. Move on to the next source.
After processing results for a sub-question:
notes/question-N.md to assess coverage.depth > 0: recurse with follow-up questions at depth - 1 and ceil(breadth / 2).depth == 0: stop, move to next sub-question.After all sub-questions are researched:
notes/*.md files (not raw sources -- the notes are already distilled).Compile report.md using findings from notes and metadata from sources-index.md.
# [Research Title]
## Executive Summary
[2-3 paragraph overview of key findings and conclusions]
## Key Findings
### [Finding 1 Title]
[Discussion with inline citations [1][2]]
### [Finding 2 Title]
[Discussion with inline citations [3][4]]
## Analysis
[Cross-cutting patterns, contradictions, consensus areas]
## Limitations & Caveats
[What this research couldn't determine, methodological limitations]
## Conclusions
[Evidence-based conclusions with confidence levels]
## Sources
[1] Author (Date). "Title." *Publication*. URL
[2] Author (Date). "Title." *Publication*. URL
Build citations from sources-index.md at report time. During the search phase, you only needed to track {url, title, source_number}. Now format the full bibliography by reading back source metadata from the index and source files as needed.
report.md in the research directory.[1], [2] that map to the Sources section.> blockquote for direct quotes.web_search("topic aspect1 ||| topic aspect2 ||| topic counter-evidence", search_depth="advanced")
scrape_webpage("https://source1.com/article, https://source2.com/study")
# After scraping, immediately write to file and extract findings
# This is critical -- do not hold raw scraped content in context
Load the openalex-paper-search skill for full API reference. Quick patterns:
# Find highly-cited papers on a topic
curl -s "https://api.openalex.org/works?search=topic+keywords&filter=cited_by_count:>50,type:article,has_abstract:true&sort=cited_by_count:desc&per_page=15&select=id,display_name,publication_year,cited_by_count,doi,authorships,abstract_inverted_index&[email protected]"
# Find recent preprints
curl -s "https://api.openalex.org/works?search=topic&filter=type:preprint,publication_year:2025&sort=publication_date:desc&per_page=10&[email protected]"
# Find review/survey papers
curl -s "https://api.openalex.org/works?search=topic&filter=type:review,cited_by_count:>20&sort=cited_by_count:desc&per_page=10&[email protected]"
# Follow citation chains: who cites a seminal paper?
curl -s "https://api.openalex.org/works?filter=cites:WORK_ID&sort=cited_by_count:desc&per_page=10&[email protected]"
Save paper metadata to sources-index.md and raw API responses to sources/ for later processing.
web_search("topic site:scholar.google.com ||| topic site:arxiv.org ||| topic systematic review OR meta-analysis", search_depth="advanced")
web_search("topic statistics site:data.gov ||| topic data site:worldbank.org ||| topic survey results", search_depth="advanced")
web_search("claim fact check ||| claim evidence ||| claim debunked OR confirmed", search_depth="advanced")
# Search past research
grep -r "keyword" research/
# Semantic search if available
lss "research question" -p /workspace --json -k 10