Investigates a question systematically before answering — verifying claims through sources, distinguishing known facts from inferences, and calibrating expressed confidence to actual certainty. Use when asked to research a topic, verify a claim, evaluate a technology, or answer a question where accuracy matters more than speed.
The default failure mode for AI agents is overconfidence: stating a plausible-sounding answer without checking it. This skill enforces a discipline: gather before claiming, distinguish facts from inferences, and express uncertainty honestly. The goal is not hesitancy — it is calibration. Confident where warranted; explicit about gaps where not.
Apply this sequence before answering any factual question:
Clarify the question — What exactly is being asked? What does "right answer" look like? What would make the answer useful versus technically correct but not helpful?
Inventory what you already know — Before searching, write down what you know and how confident you are. Mark uncertain items as hypothesis, not fact.
Identify what you don't know — List the gaps. Be specific: "I don't know the current version number" is better than "I need to check this."
Gather from authoritative sources — Prioritize: primary sources (official docs, original research, direct measurements) over secondary (articles, summaries, Stack Overflow). For each claim, note the source.
Synthesize with uncertainty labels — Present findings with explicit confidence levels (see below). State what is confirmed, what is inferred, and what remains unknown.
Surface the residual uncertainty — End with a clear statement of what you verified, what you couldn't verify, and what the user should validate themselves if the stakes are high.
Label claims explicitly when the stakes matter:
| Label | Meaning | Example |
|---|---|---|
| Confirmed | Verified against a primary source in this session | "Confirmed: Python 3.12 was released October 2, 2023 (python.org)" |
| Likely | Consistent with multiple secondary sources or strong inference from first principles | "Likely: this API is rate-limited to 100 req/min based on the docs" |
| Possible | One source, or plausible inference, not verified | "Possible: the library uses lazy evaluation here — I haven't confirmed in source" |
| Unknown | Not established — do not present as fact | "Unknown: whether this behavior applies to the v2 API endpoint" |
Never present a "Possible" or "Unknown" as "Confirmed". Resist the urge to fill gaps with confident-sounding guesses.
Different claims need different verification approaches:
Factual / current state (library version, API behavior, pricing):
Conceptual / structural (how a system works, design patterns):
Comparative / evaluative (A vs B, which is better):
Predictive (what will happen, what will work):
Stop and check before asserting when:
This skill also provides the /research-with-confidence-investigate slash command for direct invocation — see commands/research-with-confidence-investigate.md.
## Findings: [Question]
**Summary:** [1-2 sentence answer at the right confidence level]
**Confirmed:**
- [Claim] (source: [where you verified])
- [Claim] (source: [where you verified])
**Likely / inferred:**
- [Claim] (reasoning: [why you think this])
**Unknown / not verified:**
- [Gap] — [what the user would need to check to fill it]
**Note:** This research was conducted on [date]. [X] may have changed
since then — verify against [source] if currency matters.
Agent-specific failure modes — provider-neutral pause-and-self-check items:
| Tool | Best for |
|---|---|
| Official documentation | Version-specific facts, API behavior, configuration |
| Primary research papers | Scientific claims, benchmark data |
| GitHub repo source code | Actual behavior vs. documented behavior |
| GitHub issues / PRs | Known bugs, recent changes not yet in docs |
| Changelog / CHANGELOG.md | What changed in which version |
| Stack Overflow | Common usage patterns, community-known gotchas |
| Web search | Current events, recent releases, pricing |
Before presenting research findings:
When two authoritative sources disagree: