Name: Vibe Research
Author: XY-Showing

Search skills.../

Vibe Research | Skills Pool

- Research question (specific, testable)
- Hypotheses (what you expect and why)
- Evaluation metrics (defined before seeing results)
- Baselines (what you're comparing against)
- Scope boundaries (what this study does NOT claim)

Agent 1: Keyword cluster A (e.g., "gender bias medical QA LLM")
Agent 2: Keyword cluster B (e.g., "fairness healthcare NLP benchmark")
Agent 3: Citation network of seed paper
Agent 4: Related datasets / benchmarks

Claude summarizes paper → read the actual numbers in the paper
Claude reports experiment result → check the metric in W&B / log file
Claude says "significant" → check p-value and CI yourself
Claude lists citations → verify each one exists and says what Claude claims

1. Plot raw data distributions
2. Look at failure cases before aggregate metrics
3. Check for data leakage, label imbalance, demographic skew
4. Then run statistical tests
5. Then interpret

## Dead Ends (do not repeat)
- GPT-2 on MedQA: too small, results not publishable (tried 2026-01)
- WinoBias for medical domain: domain mismatch, reviewers will flag it

## Confirmed Findings
- Llama-3 shows 12% gap on female pronouns in clinical notes (our result)

## Key Papers (verified)
- [Author, Year, Venue] — one-line contribution summary

Decision	Model
Research question formulation	Opus + thinking
Experimental design	Opus + thinking
Interpreting ambiguous results	Opus + thinking
Keyword generation	Sonnet is fine
Formatting bibliography	Sonnet is fine

Situation	Action
Starting research session	Write research_plan.md first
Literature search	Parallel agents, multiple keyword clusters
Claude names a paper	Verify it exists + verify the claim to page level
Claude says "results show X"	Check the actual metric / log / plot
Experiment failed	Document in RESEARCH.md immediately
Long session (>1hr)	Check research_plan.md — are you still on track?
About to run stats	EDA first: plot distributions, check failures
Choosing baselines	Lock baselines in research_plan.md before running your method

Stage	Tools
Literature search	arXiv API, Semantic Scholar API, pyalex (OpenAlex), ACL Anthology
Paper reading + RAG	PaperQA2 (precise citations, low hallucination)
Related work drafting	Storm (Stanford) — structure first, then fill
Experiment tracking	W&B or MLflow — ground truth for all metrics
Experiment search	AIDE — metric-guided tree search for ML experiments
Writing	`/20-ml-paper-writing` skill for venue-specific structure

Vibe Research

Overview

When to Use

Workflow

1. Lock the Research Question First

Vibe Research

Overview

When to Use

Workflow

1. Lock the Research Question First

2. Parallel Literature Search

3. Primary Source Verification

4. Research Knowledge Base (RESEARCH.md)

5. Model Selection

Quick Reference

Tool Stack

Red Flags — Stop and Verify

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio