Skill File

Research Skill

Name: Research Skill
Author: arsity

Academic research lifecycle for AI/ML papers — discover, deep-read, discuss/brainstorm, cite (BibTeX via DBLP→CrossRef→S2), write paper sections. Integrates Semantic Scholar / arXiv / AlphaXiv / OpenReview / DBLP, runs venue-quality gates (CCF, impact factor), and auto-injects expert knowledge from 85 bundled AI/ML domain skills across 21 categories (architectures, fine-tuning, RAG, inference serving, alignment, interpretability, evaluation, MoE, long-context, multimodal, etc.) — users do NOT invoke those directly; this skill routes to them. Trigger on "find papers on…", "what's new in…", "let's brainstorm research…", "is this paper any good?", "give me the bibtex", "help write related work", "reviews for <paper>", "citation graph of…". Not for product/framework how-tos, rewriting a doc into paper format, cleaning messy BibTeX, or vendor blogs / whitepapers / tutorials.

arsity0 starsApr 17, 2026

Occupation
Categories: Academic

Skill Content

Full academic research lifecycle: discover, discuss, read, cite, write.

Preload

On every invocation, before doing anything else:

Load skill router: Read phases/skill-router.md for domain skill mapping. Parse any --domain or --domain-only flags from user input.

Entry Point

Parse user intent from /research <args> and route to the appropriate phase module.

Input pattern	Phase	Module
`/research discover "topic"`	discover (consolidated)	`phases/discover.md`
`/research discuss`	discuss (current session)	`phases/discuss.md`
`/research discuss <paper>`	discuss (from specific paper)	`phases/discuss.md`

Related Skills

Research Skill | Skills Pool

mkdir -p .research-workspace/sessions
echo '{"sessions": [], "current_session": null}' > .research-workspace/state.json

.research-workspace/sessions/{slug}/cache/{paper_id}/
├── overview.md          # AlphaXiv structured overview (if available)
├── fulltext.md          # AlphaXiv full text (if available)
├── paper.pdf            # arXiv or publisher PDF (if downloadable)
├── supplementary.pdf    # Supplementary/appendix PDF (if found)
├── openreview/
│   ├── reviews.json     # OpenReview official reviews (if venue uses OpenReview)
│   ├── rebuttal.json    # Author rebuttals (if available)
│   ├── meta_review.json # AC/SAC meta-review (if available)
│   ├── decision.json    # Accept/reject decision (if available)
│   └── discussion.json  # Threaded replies: author comments, reviewer follow-ups, ethics reviews
└── cache_meta.json      # What was cached, when, from where

{
  "paper_id": "s2_id or arxiv_id",
  "arxiv_id": "2401.12345",
  "doi": "10.xxxx/...",
  "cached_at": "ISO 8601",
  "contents": {
    "overview": { "source": "alphaxiv", "status": "cached|404|not_attempted" },
    "fulltext": { "source": "alphaxiv", "status": "cached|404|not_attempted" },
    "pdf": { "source": "arxiv|publisher|s2_open_access", "status": "cached|404|not_attempted" },
    "supplementary": { "source": "publisher|arxiv", "status": "cached|404|not_attempted" },
    "openreview": { "source": "openreview_api", "status": "cached|not_found|private|auth_failed|no_credentials|error|not_attempted", "venue": "ICLR 2024" }
  }
}

Detect OpenReview venue: Check if the paper's venue (from S2 metadata) is known to use OpenReview. Known venues: ICLR, NeurIPS, ICML, AAAI, CVPR, ICCV, ECCV, COLM, EMNLP, ACL, NAACL, AISTATS, UAI, KDD, The Web Conference, TMLR, and ARR (review-only, not a publication venue). This list is not exhaustive — if a venue is not listed but the paper has an OpenReview URL in S2 metadata, attempt fetch anyway.
Authenticate: POST https://api2.openreview.net/login with Content-Type: application/json and body {"id": "<OPENREVIEW_USER>", "password": "<OPENREVIEW_PASS>"}. Parse response JSON — if mfaPending is truthy, set status: "auth_failed" and stop (MFA not supported). Otherwise extract token (valid 24h). Keep the token in memory for subsequent requests; if it must be persisted, use a tempfile.NamedTemporaryFile(mode='w', delete=False) with os.chmod(path, 0o600) and clean up on exit.
Search OpenReview: GET https://api2.openreview.net/notes/search?term=<url_encoded_title>&source=forum&limit=3 with Authorization: Bearer <token>. Match by normalized title equality (lowercase, strip whitespace). If the venue ID is known, add &content.venueid=<venue_id> to narrow results.
Fetch reviews: If a forum is found, GET https://api2.openreview.net/notes?forum=<forum_id>&limit=1000 with the same Bearer header. Classify notes using a dual-signal approach — check invitations[] first (strongest signal when present), then fall back to signatures + content keys (needed because invitation is often null for published papers):
- Official reviews: invitation ending in Official_Review OR (signatures contains Reviewer_* AND content has review-like keys: summary, strengths, weaknesses, rating, confidence, soundness — field names vary by venue, e.g., review, main_review, recommendation)
- Meta-reviews: invitation ending in Meta_Review OR (signatures contains Area_Chair_*/Senior_Area_Chair_* AND content has metareview key)
- Decisions: invitation ending in Decision OR (signatures contains Program_Chairs AND content has decision key)
- Author rebuttals: signatures contains Authors AND replyto points to a note already classified as a review
- Discussion: all remaining threaded replies (author comments on forum root, reviewer follow-ups, ethics reviews, official comments)
- Notes with replyto equal to the forum ID are direct replies. Notes with replyto pointing to another note are threaded discussion.
Parse v2 content shape: In API v2, note content fields are nested: note.content.{field}.value (not note.content.{field} directly). For example, rating at note.content.rating.value (e.g., "8: accept, good paper"), decision at note.content.decision.value (e.g., "Accept (poster)"). Review text may span multiple fields depending on venue — common keys include summary, strengths, weaknesses, review, main_review, comment, recommendation. Extract all content keys, don't hardcode a single field. Some fields may also have note.content.{field}.readers controlling visibility. Caution: API responses may contain raw control characters (tabs, newlines in review text). Parse with json.loads(text, strict=False); never strip control chars from stored artifacts (only do so for disposable human-readable inspection).
Save: Write reviews.json, rebuttal.json, meta_review.json, decision.json, discussion.json to cache/{paper_id}/openreview/.
Graceful degradation: Handle failures with appropriate status:
- Auth failure (401) → status: "auth_failed", log warning
- Restricted data (403 or notes returned with redacted/missing fields) → status: "private". Do NOT freeze as permanent — reviews may become public after camera-ready
- Not found (empty results) → status: "not_found"
- Network/server error (500/502/503/504) → retry once, then status: "error"
- OpenReview data is valuable but never blocking

Phase	Cache interaction
read Step 2	Check cache before AlphaXiv/arXiv/publisher fetch. Save all fetched content to cache. Also attempt OpenReview fetch for the paper.
discover Step 6 (quick-read)	Check cache for overview.md. Save if fetched. Skip full OpenReview fetch (too heavy for batch quick-read).
discuss Phase 3 (knowledge gap)	Check cache before quick-read fetch. Save if fetched.
discuss Phase 5 (reviewer simulation)	If `openreview/reviews.json` exists in cache for any analyzed paper, incorporate real reviewer concerns into the simulation — real objections take priority over simulated ones.
write (related-work, intro)	Read cached content for positioning accuracy instead of relying on summaries alone.

{
  "phase": "discover|discuss|read|cite|write",
  "status": "completed|in_progress|failed",
  "timestamp": "ISO 8601",
  "completed_steps": ["step1", "step2"],
  "pending_steps": ["step3"],
  "key_artifacts": {
    "discover_json": "relative path if exists",
    "brief_json": "relative path if exists",
    "read_analyses": ["paper_id1", "paper_id2"],
    "cite_log": "relative path if exists"
  },
  "context_summary": "1-2 sentence summary of what was accomplished and what remains",
  "skills_loaded": ["clip"],
  "user_decisions": ["chose direction A over B", "skipped experiment design"]
}

Failure type	Meaning	Action
Zero results	Query returned nothing from primary sources	Escalate through strategy ladder
Exact-match miss	Title/DOI/arXiv ID lookup failed	Check alternate titles (preprint vs. published), arXiv ID variants, DOI redirects
Metadata mismatch	Found paper but required fields missing	Try alternate source in priority order: DBLP → CrossRef → S2 (per Iron Rule #2)
Indexing lag	Paper too new for database	Check arXiv directly, AlphaXiv MCP retrieval, or accept "not yet indexed"
Query drift	Results returned but none directly relevant	Tighten query specificity, add field-specific keywords, filter by year
Version drift	Preprint title/content differs from published version	Search both titles, check DOI and arXiv ID separately
Timeout or rate limit	Operational failure	Retry once after delay, then proceed with remaining sources
Source outage	API down	Skip source, log warning, proceed with remaining sources

Trigger condition	Typical phase
All applicable primary searches return 0 results for a non-trivial query (S2 + AlphaXiv in discover; DBLP + CrossRef + S2 in cite)	discover Step 3-4, cite Step 2
2+ consecutive API timeouts or HTTP errors across any scripts	any phase
Knowledge-gap search in discuss cannot find the referenced method/baseline	discuss Phase 3
>30% of `\cite{}` references fail verification during write	write Step 4

{
  "query_original": "...",
  "failure_type": "zero_results|exact_match_miss|metadata_mismatch|indexing_lag|query_drift|version_drift|timeout_or_rate_limit|source_outage",
  "attempts": [
    {
      "strategy": "normalize|exact_match|adjust_year|decompose|switch_mode|graph_search|body_search|adjacent_fields",
      "query": "expanded query text",
      "source": "s2_search|s2_bulk|s2_match|s2_citations|s2_references|s2_recommend|s2_snippet|dblp_search|dblp_bibtex|arxiv_bibtex|crossref_search|doi2bibtex|alphaxiv_agentic|alphaxiv_full_text|alphaxiv_embedding",
      "result_count": 0,
      "status": "matched|zero_results|irrelevant|timeout|rate_limited|source_down|error",
      "notes": "optional: error details, HTTP status, why results were irrelevant"
    }
  ],
  "final_outcome": "found|not_found_after_exhaustive_search|genuine_null|blocked_by_operational_failure",
  "resolved_by": "normalize|exact_match|adjust_year|decompose|switch_mode|graph_search|body_search|adjacent_fields|null"
}

#	Mode	Phase	What to send	What to ask
1	Co-thinker	discuss Phase 2 (Assumption Surfacing)	Papers analyzed + field context from discover	"What assumptions does this field take for granted? What would break if each assumption were violated?"
2	Co-thinker	discuss Phase 3 (Discussion Loop)	Current discussion state (latest findings, open questions, proposed angles)	Phase 3-specific: varies per turn. See discuss.md for integration details.
3	Adversarial	discuss Phase 4 (Adversarial Novelty Check)	Proposed direction + closest existing work	"As a skeptical reviewer at {target venue}: (1) Is the claimed novelty real or superficial? (2) What existing work was missed? (3) What's the strongest argument against this direction?"
4	Adversarial	discuss Phase 5 (Reviewer Simulation)	Proposed direction + research brief so far	"Generate 3-4 specific reviewer objections. For each: the weakest claim, the missing baseline, the essential ablation, and severity (High/Medium/Low)."
5	Adversarial	discuss Phase 6 (Significance Test)	Proposed direction + significance analysis from Claude	"Evaluate this direction on three tiers: (1) real-world impact with concrete failure modes, (2) would the community think differently if this succeeds, (3) expected improvement magnitude vs. SOTA. Flag any tier that is weak."
6	Cold reader	discuss Phase 7 (Simplicity Test)	User's 2-sentence explanation ONLY — no research brief, no context	"Based only on these 2 sentences, explain back what the research idea is and what makes it novel. What is unclear or ambiguous?"
7	Adversarial	discuss Phase 8 (Experiment Design)	Completed experiment plan draft + research brief	"What baselines are missing? What essential ablation is not listed? Are there better-suited datasets? Is the expected results table realistic?"
8	Adversarial	discuss Phase 9 (Convergence Decision)	Complete research brief	"Given everything in this brief, would you recommend this direction for {target venue}? What is the single biggest risk? What would make you abandon this direction?"
9	Adversarial	write Step 5.5 (abstract + intro only)	Draft text + research brief	"As an AC at {target venue}: (1) Does the motivation hold up? (2) Is the contribution clearly distinguished from prior work? (3) What would make you desk-reject this?"
10	Adversarial	write related-work	Draft related-work section + discover results + read analyses	"As a reviewer: (1) What important related work is missing? (2) Is any prior work mischaracterized or unfairly compared? (3) Is the positioning of our contribution honest and precise?"

GOAL        — What decision this task informs; why it matters to the current phase
DELIVERABLE — Exact artifact to produce (format, fields, structure)
EVIDENCE    — Allowed evidence sources (which papers, which APIs, what context to send)
CONSTRAINTS — Result caps, token budget, timeout
DONE WHEN   — Acceptance criteria (how to verify the output is correct and complete)
EXCLUSIONS  — Forbidden behavior (what to exclude, what NOT to do)

GOAL:       Find papers relevant to "{topic}" for landscape analysis
DELIVERABLE: JSON objects, each with: paper_id, title, year, venue, citations, doi, arxiv_id, authors, source
EVIDENCE:   Agent 1 (S2 subagent) — S2 API only (s2_search.py, s2_bulk_search.py)
            Agent 2 (AlphaXiv subagent) — three claude.ai-bound AlphaXiv MCP tools (agentic_paper_retrieval, full_text_papers_search, embedding_similarity_search), issued as three parallel tool calls in a single subagent message. Subagents inherit the top-level session's MCP bindings (empirically verified).
CONSTRAINTS: Top 20 results for S2; ~10 per AlphaXiv tool; 60-second timeout per subagent
DONE WHEN:  ≥1 result returned with all required fields populated; exit 0
EXCLUSIONS: Don't analyze, rank, or summarize papers — just retrieve and return raw results. Don't cross sources.

GOAL:       {varies — e.g., "Surface assumptions this field takes for granted" or "Find weaknesses in this research direction for {venue}"}
DELIVERABLE: Structured response: numbered findings, each with a concrete claim and evidence
EVIDENCE:   Only the artifacts listed in the "What to send" column of the invocation table — never the full conversation history
CONSTRAINTS: 3-5 actionable findings; no filler
DONE WHEN:  Each finding is specific enough to act on (names a paper, identifies a gap, flags a weakness)
EXCLUSIONS: Don't restate what was sent; don't make stylistic suggestions; don't hedge with "this could be strengthened" — say what's wrong and why

GOAL:       Fill knowledge gap identified during discussion: "{specific method/baseline/claim}"
DELIVERABLE: 1-3 relevant papers with title, year, venue, and one-sentence summary of relevance
EVIDENCE:   S2 search + DBLP search; use the exact method/baseline name as query
CONSTRAINTS: Top 3 results; 30-second timeout per source
DONE WHEN:  At least one paper found that addresses the gap, or explicit "not found after exhaustive search"
EXCLUSIONS: Don't fabricate papers; don't use model memory; don't return tangentially related work

GOAL:       {varies — e.g., "As an AC at {venue}, evaluate whether this abstract/intro would survive desk review"}
DELIVERABLE: 2-3 specific revision suggestions, each with: the problematic text, what's wrong, and a concrete fix direction
EVIDENCE:   Only the draft section text + research brief — not the full paper or conversation
CONSTRAINTS: Focus on substance (motivation, contribution clarity, positioning) not prose style
DONE WHEN:  Each suggestion identifies a specific passage and explains why it's a problem
EXCLUSIONS: Don't praise what works; don't suggest word-level edits; don't repeat the Iron Rules back

Script	Purpose
`s2_search.py`	S2 relevance-ranked semantic search
`s2_bulk_search.py`	S2 boolean bulk search with year filtering
`s2_batch.py`	S2 batch metadata by paper IDs (NOT a search)
`s2_citations.py`	Papers that cited a given paper
`s2_references.py`	Papers cited by a given paper
`s2_recommend.py`	Paper recommendations from positive/negative examples
`s2_snippet.py`	Search within paper bodies for specific passages
`s2_match.py`	Exact title match (single result)
`dblp_search.py`	DBLP publication search
`dblp_bibtex.py`	Fetch condensed BibTeX via DBLP search API (title + author + year)
`arxiv_bibtex.py`	Fetch @misc BibTeX from arxiv.org (arXiv ID)
`crossref_search.py`	CrossRef search (fallback)
`doi2bibtex.py`	DOI → BibTeX via content negotiation

Before using /research, please ensure:

1. Install required skills/plugins:
   - Orchestra-Research AI-Research-SKILLs (provides ml-paper-writing, brainstorming-research-ideas, creative-thinking-for-research, and 21 domain skill categories)
   - humanizer skill

2. Set up API keys in your .claude/settings.json:
   { "env": { "S2_API_KEY": "your-key-here", "OPENREVIEW_USER": "optional", "OPENREVIEW_PASS": "optional" } }
   Claude Code automatically exports env entries as environment variables.
   - Semantic Scholar: https://www.semanticscholar.org/product/api/api-key
   - OpenReview (optional): https://openreview.net/profile

Service	Limit	Strategy
S2	1 req/sec (with key)	Sequential within agent, use batch/bulk
DBLP	~1 req/sec	Sequential, 1s delay
CrossRef	No strict limit	Polite usage
AlphaXiv MCP	No strict limit	Three retrieval tools run in parallel inside Agent 2
AlphaXiv fetch (raw markdown)	No strict limit	Respect 404s, no retry loop

Script	Purpose
`venue_info.py`	Venue quality summary (CCF + IF + quartile)
`ccf_lookup.py`	CCF ranking lookup
`if_lookup.py`	Impact factor lookup
`author_info.py`	Author h-index and stats

Research Skill

Preload

Entry Point

Research Skill

Preload

Entry Point

Workspace

Paper Cache

Cache directory structure

cache_meta.json schema

Cache protocol

OpenReview integration

How cache integrates with phases

State Persistence

Checkpoint format

Write checkpoint

Recovery on context compaction

Discuss phase mid-conversation checkpoints

Unified Input Parsing

Input Types

Clarify Flow

Domain Override Flags

Timeout Policy

Iron Rules

Search Escalation Protocol

Failure classification

Strategy ladder

Trigger conditions

Stop condition and attempt ledger

Cross-Model Collaboration

Roles

When to invoke

Protocol

Phase 3 integration details

What this is NOT

Task Brief

Schema

Application by delegation type

When to skip

Language

Scripts

Search

Quality Evaluation

Config

Dependencies

Required skills/plugins

Optional MCP servers

Required API keys

Optional API credentials

Installation prompt

Rate Limits

Goplaces

Research Ops

Editor

Fact Checker

Deep Research

Academic Researcher