Name: Research
Author: keithbinkly

スキルを検索.../

Research | Skills Pool

Tool	What It Does	When To Use
`mcp__arxiv-latex__get_paper_prompt`	Fetch LaTeX source from arXiv	Always use for arXiv papers — lossless math, tables, structure
`mcp__paper-search__read_arxiv`	Read arXiv paper content	Fallback if LaTeX source unavailable
`mcp__paper-search__download_arxiv`	Download arXiv PDF	When you need the PDF file itself
`summarize` CLI	Extract web page content	Blog posts, Substack, Medium articles
`pandoc` CLI	EPUB-to-markdown conversion	Best for EPUBs — direct, no dependencies, preserves structure
`pdftotext -layout`	PDF-to-text conversion	Fast fallback for PDFs when Marker stalls
`marker_single` CLI	PDF-to-markdown (ML-based)	High-fidelity PDFs with complex layout — but first-run downloads ~1GB models
`Read` tool (built-in)	Read PDF files directly	Quick look at short PDFs (<10 pages)
`WebFetch`	Fetch URL content	Direct URL access when summarize isn't needed

Tool	What It Does	When To Use
`mcp__openalex__get_work_citations`	Papers that cite a given work	"Who built on this paper?"
`mcp__openalex__get_work_references`	Papers a given work cites	"What does this paper build on?"
`mcp__openalex__get_citation_network`	Full citation graph	Mapping a research area
`mcp__openalex__get_related_works`	Related works by topic	Finding adjacent research
`mcp__openalex__get_author`	Author profile + works	Deep dive on a specific researcher

Source type?
│
├─ arXiv paper (has arXiv ID like 2512.24601)
│  ├─ Get LaTeX source: mcp__arxiv-latex__get_paper_prompt
│  └─ Get citation network: mcp__openalex__get_work (by DOI) → get_work_citations
│
├─ Academic paper (non-arXiv, has DOI)
│  ├─ Find it: mcp__paper-search__search_semantic OR mcp__openalex__search_works
│  ├─ If PDF available: marker_single paper.pdf --output_dir /tmp/marker_out --output_format markdown
│  └─ Citation network: mcp__openalex__get_work → get_work_citations / get_work_references
│
├─ EPUB (ebook)
│  ├─ Convert: pandoc book.epub -t markdown --wrap=none -o output.md
│  ├─ If EPUB is an expanded directory: zip it first (zip -X0 out.epub mimetype && zip -Xr out.epub *)
│  └─ Note: pandoc is fast, dependency-free, and handles EPUB→md natively
│
├─ PDF (ebook, report, vendor doc)
│  ├─ Fast path: pdftotext -layout file.pdf output.md (always works, rough formatting)
│  ├─ High quality: marker_single file.pdf --output_dir /tmp/marker_out --output_format markdown
│  ├─ For complex tables/math: add --use_llm (requires GOOGLE_API_KEY or --claude_api_key)
│  ├─ ⚠️ First run: Marker downloads ~1GB of models — may appear to hang
│  └─ Read output: Read /tmp/marker_out/<filename>/<filename>.md
│
├─ Web article (Substack, Medium, blog)
│  ├─ Extract: summarize "<url>" --extract-only --json
│  └─ If paywall/JS-heavy: use Exa (has content extraction built in)
│
├─ GitHub repo
│  ├─ Code context: mcp__exa__get_code_context_exa
│  └─ Deep analysis: clone + scout/oracle agent with specific extraction brief
│
├─ Research landscape question ("what's the state of X?")
│  ├─ Academic: mcp__openalex__search_works + get_trending_topics
│  ├─ Industry: mcp__exa__web_search_exa
│  └─ Both: parallel agents — one academic, one industry
│
└─ "What cites this?" / "What does this build on?"
   └─ mcp__openalex__get_work_citations / get_work_references

Read this paper. Extract:
1. Core claim (one sentence)
2. Methodology — what did they actually do?
3. Key findings — specific numbers, not descriptions
4. Limitations the authors acknowledge
5. Limitations they DON'T acknowledge
6. How this connects to [specific thesis/question]
7. Citation count and year (for recency weighting)

Read this article. Extract:
1. Core claim (one sentence)
2. Evidence quality — is this opinion, anecdote, or data?
3. Author's incentive structure — are they selling something?
4. Specific metrics or data points cited (with their sources)
5. How this connects to [specific thesis/question]
6. What's genuinely new vs. repackaged conventional wisdom?

Broad sweep first — for each source, extract in 3-5 lines:
1. Author and affiliation
2. Core claim
3. Evidence type (data/opinion/case study)
4. Cluster assignment (which theme does this belong to?)

Then identify: which clusters have the most signal? Which sources
disagree? Where are the gaps — questions nobody is asking?

After converting with marker_single, extract:
1. Table of contents with chapter summaries (1-2 lines each)
2. Key frameworks or models introduced
3. Practitioner advice that's specific enough to act on
4. Claims that can be verified against other sources
5. Which chapters are most relevant to [specific question]?

# Basic conversion (no LLM, no API key needed)
marker_single /path/to/file.pdf \
  --output_dir /tmp/marker_out \
  --output_format markdown

# Maximum quality (uses Claude for complex tables/math)
marker_single /path/to/file.pdf \
  --output_dir /tmp/marker_out \
  --output_format markdown \
  --use_llm \
  --llm_service marker.services.claude.ClaudeService \
  --claude_api_key $ANTHROPIC_API_KEY

# Batch convert a directory
marker /path/to/pdfs/ \
  --output_dir /tmp/marker_out \
  --output_format markdown

# Output location: /tmp/marker_out/<filename>/<filename>.md

1. DISCOVER
   mcp__paper-search__search_arxiv("context engineering semantic layer", max_results=10)
   — or —
   mcp__openalex__search_works("knowledge graph agent memory", per_page=10)

2. EVALUATE (quick scan of abstracts/titles)
   Which papers are worth deep reading?
   Check: citation count, recency, author credibility, relevance to our thesis

3. EXTRACT
   If arXiv: mcp__arxiv-latex__get_paper_prompt(arxiv_id)
   If PDF:   marker_single paper.pdf --output_dir /tmp/marker_out --output_format markdown

4. READ with extraction brief
   Oracle agent with specific brief (see templates above)
   — or —
   Direct Read if paper is short enough for context

5. MAP CITATIONS
   mcp__openalex__get_work_citations(work_id) — who built on this?
   mcp__openalex__get_work_references(work_id) — what foundation does this rest on?

6. CONNECT
   How does this paper relate to:
   - Our active beliefs (beliefs.md)?
   - Our open questions (reading-wants.md)?
   - Resources already in the library (resources.yaml)?

7. STORE
   Update beliefs.md if evidence changes confidence
   Update reading-wants.md if new threads emerge
   Add to resources.yaml if it belongs in the library

1. CONVERT
   EPUB → pandoc book.epub -t markdown --wrap=none -o output.md
   PDF  → pdftotext -layout book.pdf output.md (fast)
         marker_single book.pdf --output_dir /tmp/marker_out (high quality, slow first run)
   Expanded EPUB dir → zip first, then pandoc

2. TRIAGE
   Check line counts (wc -l *.md) to gauge scope
   Read first 80 lines of each to see TOC/structure
   Prioritize by relevance to extraction goal

3. PARALLEL EXTRACTION
   Dispatch scout agents (model: sonnet) — one per book
   Each agent gets a targeted extraction brief:
   - What specific knowledge to extract (not "summarize")
   - How to organize the output (headers, bullet points)
   - What to flag (tensions, surprises, quotable insights)

4. SYNTHESIS
   Opus reads all extraction outputs
   Cross-references claims across books (where do authors agree? disagree?)
   Produces final resource files organized by purpose, not by source

5. INTEGRATION
   Wire resource files into the relevant skill
   Update reading-wants.md and beliefs.md if findings shift understanding

Tool	What It Does	When To Use
`mcp__exa__web_search_exa`	Neural web search, content extraction	Blog posts, articles, general web research
`mcp__exa__get_code_context_exa`	Code docs, API references, library examples	Technical documentation, SDK references
`mcp__paper-search__search_arxiv`	Search arXiv by query	Academic papers — CS, ML, AI, math, physics
`mcp__paper-search__search_semantic`	Semantic Scholar search

Tool	What It Does	When To Use
`mcp__exa__web_search_exa`	Neural web search, content extraction	Blog posts, articles, general web research
`mcp__exa__get_code_context_exa`	Code docs, API references, library examples	Technical documentation, SDK references
`mcp__paper-search__search_arxiv`	Search arXiv by query	Academic papers — CS, ML, AI, math, physics
`mcp__paper-search__search_semantic`	Semantic Scholar search

Research

Research Skill — Deep Extraction Pipeline

Tool Inventory

Discovery (find sources)

Research

Research Skill — Deep Extraction Pipeline

Tool Inventory

Discovery (find sources)

Extraction (get content from sources)

Citation & Network (map the landscape)

Routing Decision Tree

Extraction Brief Templates

For Academic Papers

For Industry Articles / Blog Posts

For Landscape Mapping (5+ sources)

For Ebooks / Long PDFs

Marker CLI Reference

Workflow: Scholarly Deep Read

Workflow: Ebook Mining Pipeline

Anti-Patterns

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing