End-to-end deep web research with scientific rigor. Combines multi-source search (Tavily, Semantic Scholar, CrossRef), academic paper processing (three-pass reading, PDF archiving, arXiv protocol), citation management (BibTeX, DOI extraction), bookmark management (Raindrop auto-categorization, rich notes), critical evaluation (evidence hierarchy, bias detection, GRADE), and Zettelkasten knowledge synthesis. Use when researching any topic, reading papers, conducting literature reviews, evaluating evidence, building bibliographies, processing arXiv papers, bookmarking sources, or synthesizing research into actionable knowledge. Triggers: "research X", "find papers on", "literature review", "investigate", "evaluate evidence", "what does the research say about", "read this paper", "deep dive into".
Conduct systematic, evidence-based research from discovery through knowledge synthesis. Every claim is backed by sources. Every source is bookmarked. Every insight is captured in Zettelkasten. Every paper gets a BibTeX entry.
Core Principle: Search → Acquire → Process → Synthesize → Expand.
| Mode | Time | Scope | Output |
|---|
| Quick | ~30 min | Discovery + first pass + bookmark | Bookmarked sources, fleeting notes |
| Standard | ~2 hr | Full reading + source zettel + BibTeX | Literature notes, bibliography |
| Deep | ~5+ hr | Everything + permanent notes + synthesis | Complete knowledge graph with MOC |
Goal: Find relevant sources across academic and web domains.
Academic Search (Semantic Scholar):
1. autocomplete_query("topic") → refine terms
2. search_papers(query, year=YYYY, limit=30, fields=[title, authors, year, abstract, citationCount, url])
3. get_paper(paper_id, fields=[...references, citations]) → for key papers
Web Search (Tavily):
1. tavily_search(query, search_depth="advanced", include_raw_content=true, max_results=20)
2. tavily_extract(urls=[...], extract_depth="advanced") → full content from top results
Evidence Quality Hierarchy (prioritize sources):
Cross-Verification: 2+ independent sources for critical claims. Check original sources, not secondary citations.
Output: Candidate source list with quality tier annotations.
For each valuable source, acquire metadata and archive content.
Auto-detect source type and extract accordingly:
| Source Type | Action |
|---|---|
| arXiv paper | Full arXiv protocol (see below) |
| DOI | CrossRef API → BibTeX |
| PMID | PubMed E-utilities → BibTeX |
| Web URL | Tavily extract → bookmark |
| PDF URL | Download + bookmark |
BibTeX generation goes to notes/references.bib. Citekey format: @lastnameYEAR (e.g., @hong2023).
# 1. Extract metadata
tavily_extract(["https://arxiv.org/abs/[ID]"], extract_depth="advanced")
# 2. Download PDF
curl -L -o "notes/literature/Author-Year-Title.pdf" "https://arxiv.org/pdf/[ID].pdf"
# 3. Read FULL PDF (not just abstract)
pdftotext notes/literature/Author-Year-Title.pdf -
# 4. Verify download
file notes/literature/Author-Year-Title.pdf
Why full PDF: Abstracts miss 80%+ of value (methodology, limitations, figures, ablations).
Every source gets a bookmark via mcp_raindrop_create_bookmarks.
Auto-categorize using collection matcher — see bookmark-collections.md.
Bookmark format:
{
"link": "URL",
"title": "Paper Title - Author Year",
"excerpt": "2-3 sentence summary (MAX 250 chars for Raindrop)",
"note": "## Key Insights\n- ...\n## Key Extracts\n> \"...\"\n## Connections\n- ...",
"collection": COLLECTION_ID
}
Storage Strategy:
Read and analyze each source using evidence-based methods.
| Pass | Time | Focus | Output |
|---|---|---|---|
| 1st | 5-10 min | Title, abstract, headings, figures, conclusions | Five Cs assessment, continue/stop decision |
| 2nd | 30-60 min | Full content grasp, key excerpts, AIC extraction | Draft source zettel |
| 3rd | 1-5 hr | Deep understanding, virtual re-implementation | Complete source zettel |
Five Cs Assessment (always do on first pass):
AIC Content Abstraction (Pacheco-Vega):
For detailed reading methodology, see reading-methods.md.
Every reading produces a comprehensive literature note via zk_create_note(note_type="literature").
Use the template at source-zettel-template.md.
Minimum requirements:
Apply proportionate to source importance:
Quick check (all sources): Evidence tier, cross-verification status, bias red flags.
Standard evaluation (key sources): Methodology critique, bias detection, argument structure.
Deep evaluation (critical sources): Full GRADE assessment, statistical validity, logical fallacy check.
For detailed evaluation frameworks, see evidence-hierarchy.md.
Transform information into interconnected knowledge.
Extract atomic concepts from literature notes into permanent notes:
zk_create_note(
title="Declarative Concept Title", # e.g., "Role Specialization Improves Multi-Agent Task Decomposition"
content="[Concept in own words + evidence from multiple sources]",
note_type="permanent",
tags="concept-type, domain, application"
)
Quality gate: Single idea, own words, self-contained, declarative title, 3-7 tags, linked to 2+ notes.
Connect notes using typed relationships:
| Link Type | Use When | Weight |
|---|---|---|
supports | Evidence for a claim | High |
contradicts | Opposing views | High |
extends | Building on a concept | High |
refines | Clarifying/improving | Medium |
questions | Raising doubts | Medium |
reference | Simple citation | Low |
related | Generic connection | Low |
Always use bidirectional=true. Always include description explaining WHY notes connect.
Create Maps of Content (MOCs) when 7+ notes cluster around a theme:
zk_create_note(
title="MOC: Topic Name",
note_type="structure",
content="## Core Concepts\n- [[id1]]: Concept A\n- [[id2]]: Concept B\n..."
)
Every source maintains three linked records:
Bookmark (Raindrop) ⟷ BibTeX (@citekey in references.bib) ⟷ Literature Note (Zettelkasten)
Grow the knowledge graph through citation mining and gap identification.
Citation Mining:
get_paper_references(paper_id) → foundational work (backward)
get_paper_citations(paper_id) → subsequent developments (forward)
Knowledge Discovery:
zk_find_similar_notes(note_id, threshold=0.3) → semantic neighbors
zk_find_central_notes(limit=10) → knowledge hubs
zk_find_orphaned_notes() → unconnected notes to link or delete
Gap Analysis: Identify unanswered questions, contradictions, and areas needing deeper investigation. Document in structure notes.
# Research Report: [Topic]
## Executive Summary
[2-3 paragraphs: key findings and recommendations]
## Research Question(s)
## Methodology (sources, date range, quality threshold)
## Key Findings (with evidence, cross-verification, confidence)
## Alternative Approaches Considered
## Recommended Approach (with rationale)
## Zettelkasten Integration (notes created, links established)
## Bookmarks (collection URL, total sources)
## References (academic, professional, documentation)
## Gaps and Future Research
Discovery → Get metadata → Download PDF (if arXiv) → Bookmark →
First pass (Five Cs) → Second pass (excerpts) → Third pass (if important) →
Source zettel → BibTeX entry → Link to existing notes
Seed search → Identify 3-5 quality papers → Citation mining →
Batch first-pass all candidates → Filter → Second-pass selected →
Third-pass key papers → Create MOC → Map themes and gaps
Define requirements → Academic search → Web search for implementations →
Bookmark all → Extract features → Create comparison notes →
Link: supports/contradicts → Recommend with evidence
Fast first pass (5 min) → Five Cs → Targeted read (10-20 min) →
Decision (deep dive / bookmark / discard) → Fleeting note if keeping
notes/literature/ with standardized namesnotes/references.bib| Pitfall | Solution |
|---|---|
| Abstract-only reading | Always read full PDF for arXiv papers |
| Single-source reliance | Require 2+ for critical claims |
| Confirmation bias | Search for contradicting evidence |
| Bookmark without notes | Use rich note template every time |
| Outdated information | Use time_range filters, check dates |
| Synthesis paralysis | Set time limits, start notes after 10-15 sources |
| Citation chain errors | Check original sources, not secondary |
This skill is the knowledge foundation for the Deep Research Orchestrator agent system. Agents that load this skill:
| Agent | Role | Uses From This Skill |
|---|---|---|
| research-web-track | Web search via Tavily | Search strategies, evidence hierarchy, bookmark creation |
| research-scholar-track | Academic papers via Semantic Scholar | Three-pass reading, arXiv protocol, citation formats |
| research-evidence-evaluator | Source credibility assessment | Evidence hierarchy, CRAAP test, bias detection |
| research-citation-manager | BibTeX and citation networks | Citation formats, BibTeX generation, citekey conventions |
| research-synthesis-writer | Narrative writing | Research report template, synthesis patterns |
Orchestrator: deep-research-orchestrator coordinates all agents.
Quick Start: See docs/deep-research-quick-start.md for usage guide.
| File | Content | Load When |
|---|---|---|
| search-strategies.md | Tavily, Semantic Scholar, CrossRef query patterns | Complex searches |
| reading-methods.md | Three-pass, SQ3R, AIC, synthetic notes detail | Deep paper reading |
| evidence-hierarchy.md | GRADE, bias detection, quality assessment | Evaluating evidence |
| citation-formats.md | BibTeX types, metadata APIs, validation | Building bibliography |
| bookmark-collections.md | Raindrop collection IDs, URL pattern matching | Every bookmark creation |
| Template | Purpose |
|---|---|
| source-zettel-template.md | Comprehensive reading documentation |
| bookmark-note-template.md | Rich bookmark note structure |