Multi-model deep research with comparative assessment (OpenAI + Perplexity + Gemini). Queries 3 deep research providers in parallel and produces a comparative synthesis.
Query up to 3 deep research models (OpenAI o3-deep-research, Perplexity sonar-deep-research, Gemini deep-research-pro) in parallel, then produce a comparative assessment highlighting agreements, disagreements, and unique insights.
The skill requires at least one API key. Check ~/.claude/.env:
If no config exists, create it:
cat > ~/.claude/.env << 'ENVEOF'
# Deep Research API Configuration
# All keys are optional — skill works with any subset
# OpenAI (o3-deep-research via Responses API)
OPENAI_API_KEY=
# Perplexity (sonar-deep-research via Chat Completions API)
PERPLEXITY_API_KEY=
# Gemini (deep-research-pro via Interactions API)
GEMINI_API_KEY=
ENVEOF
chmod 600 ~/.claude/.env
echo "Config created at ~/.claude/.env"
echo "Add at least one API key for deep research."
DO NOT stop if the config doesn't exist. Create it and tell the user to add keys.
Step 1: Run the deep research script
CRITICAL: This script takes 2-10 minutes. It runs blocking — do NOT use run_in_background. This skill runs in a forked context, so blocking is correct.
Create the output directory and run the script:
# timeout must exceed script's internal 1800s timeout
# Output stays project-local at .claude/research/ so implementer can reference it
RESEARCH_DIR=".claude/research/DeepResearch_[SafeTopic]_[YYYY-MM-DD]"
mkdir -p "$RESEARCH_DIR" && \
python3 ~/.claude/skills/deep-research/scripts/deep_research.py "$ARGUMENTS" \
--output-dir "$RESEARCH_DIR" --validate=2 2>&1
Set timeout: 1920000 on the Bash tool call (script's 1800s timeout + 120s buffer = 1920s = 32 min).
The script will:
raw_results.json to the output directoryIMPORTANT: Deep research models take 2-10 minutes per provider. The script handles all polling internally. Do NOT interrupt it.
Step 2: Read and parse the results
Use the Read tool to read raw_results.json from the output directory. The file is a JSON object:
{
"topic": "the research topic",
"provider_count": 3,
"success_count": 3,
"warnings": [],
"citation_validation": {
"depth": 1,
"total": 47,
"valid": 42,
"invalid": 2,
"unreachable": 3,
"skipped": 0
},
"comparison_matrix": {
"providers": ["openai", "perplexity", "gemini"],
"topics": [
{
"name": "company overview",
"openai": "detailed",
"perplexity": "mentioned",
"gemini": "detailed",
"agreement": "consensus"
}
],
"citation_overlap": {
"https://example.com/paper": ["openai", "gemini"]
},
"stats": {
"total_topics": 18,
"consensus": 5,
"majority": 7,
"unique": 6
}
},
"results": [
{
"provider": "openai",
"success": true,
"report": "full report text...",
"citations": [
{
"url": "...",
"title": "...",
"validation": {
"status": "valid",
"depth": 1,
"details": "HTTP 200"
}
}
],
"model": "o3-deep-research-2025-06-26",
"elapsed_seconds": 145.3,
"error": null
},
...
]
}
Note: citation_validation and validation fields within citations are only present if --validate=1 or higher was used. The comparison_matrix key is always present — it is computed deterministically by the script from markdown headings and citation URLs.
Step 3: Check for provider failures
MANDATORY: Before synthesis, check if success_count < provider_count (or check the warnings array in the JSON). If ANY providers failed:
WARNING: prefix — which providers failed, their error messages, and elapsed timeDo NOT silently skip failed providers. The user must know about failures before reading the report.
Read ALL provider reports carefully. Then produce a report in this structure:
# Deep Research Report: [Topic]
## Provider Status
| Provider | Status | Time | Notes |
|----------|--------|------|-------|
| OpenAI | OK | 145s | |
| Perplexity | OK | 89s | |
| Gemini | FAILED | 600s | HTTPError: timed out after 600s |
*(Always include this table. Green path: all OK. Failure path: makes problems immediately visible.)*
## Executive Summary
[3-5 sentence overview of the key findings across all models]
## Individual Model Reports
### OpenAI (o3-deep-research) — [elapsed]s
[Condensed key findings from OpenAI's report — preserve the important facts,
remove redundant prose. 200-400 words.]
### Perplexity (sonar-deep-research) — [elapsed]s
[Condensed key findings from Perplexity's report. 200-400 words.]
### Gemini (deep-research-pro) — [elapsed]s
[Condensed key findings from Gemini's report. 200-400 words.]
## Comparative Assessment
Tag each finding with its agreement level:
- `[consensus]` — All providers agree
- `[majority]` — 2+ providers agree
- `[contested]` — Providers disagree
- `[unique-<provider>]` — Single provider finding (e.g., `[unique-openai]`)
### Points of Agreement
[`[consensus]` findings — claims made by all providers. Highest confidence.]
### Points of Majority Agreement
[`[majority]` findings — claims made by 2+ but not all providers.]
### Points of Disagreement
[`[contested]` findings — claims where providers contradict each other. Note which model says what.]
### Unique Insights
[`[unique-<provider>]` findings — single-provider findings. Interesting but lower confidence.]
### Confidence Assessment
| Finding | OpenAI | Perplexity | Gemini | Confidence |
|---------|--------|------------|--------|------------|
| [key claim 1] | ✓ | ✓ | ✓ | High |
| [key claim 2] | ✓ | ✓ | — | Medium |
| [key claim 3] | — | — | ✓ | Low |
### Source Quality Comparison
| Provider | Citations | Report Length | Depth |
|----------|-----------|-------------|-------|
| OpenAI | [n] sources | [n] words | [assessment] |
| Perplexity | [n] sources | [n] words | [assessment] |
| Gemini | [n] sources | [n] words | [assessment] |
## References
**CRITICAL: The report must be verifiable.** Include a numbered references section at the end using citations from all providers. Every factual claim in the report should be traceable to a source.
Build the references list by:
1. Collecting all citation URLs from the `citations` arrays in the JSON results
2. Deduplicating by URL (multiple providers may cite the same source)
3. Numbering them sequentially
4. Using inline reference numbers `[1]`, `[2]` etc. throughout the report body to link claims to sources
Format:
[1] Title or description — URL
[2] Title or description — URL
...
If a provider (like Gemini) returns no structured citations, note that its claims are unsourced and lower confidence. Prefer citing claims that have URLs backing them.
Adaptation rules:
The output directory was already created in Step 1. The script already wrote raw_results.json there.
Write these additional files to the same directory:
| File | Contents |
|---|---|
report.md | Your comparative synthesis (the report above) |
comparison-matrix.md | Topic coverage matrix (see below) |
openai.md | OpenAI's full report text (from results[].report where provider=openai) |
perplexity.md | Perplexity's full report text |
gemini.md | Gemini's full report text |
Only write provider files for providers that succeeded. The raw individual reports are often 5-40K chars — preserve them in full, don't truncate.
The script produced a comparison_matrix object inside raw_results.json and a standalone comparison_matrix.json. Start from the pre-built matrix. The script matched topics using heading text similarity. Review the results:
match_method: "exact" or "heading-fuzzy" — these are high-confidence matches.unmatched_hints — these are unique topics the code couldn't match, along with their body keywords. If any cover the same subject across providers despite different headings, merge them in your synthesis and note the merge with [llm-merged].How to render the matrix:
comparison_matrix.topics from the JSON — each entry has name, one key per provider ("detailed", "mentioned", or "absent"), and agreement.comparison_matrix.citation_overlap to call out URLs cited by multiple providers (these are the highest-confidence sources).comparison_matrix.stats for the summary line (consensus/majority/unique counts).Example output format:
## Topic Coverage Matrix
| Topic | OpenAI | Perplexity | Gemini | Agreement |
|-------|--------|------------|--------|-----------|
| company overview | Detailed | Mentioned | Detailed | consensus |
| apt group connections | Detailed | Detailed | Absent | majority |
| post-leak developments | Absent | Detailed | Absent | unique-perplexity |
**Stats:** 18 topics — 5 consensus, 7 majority, 6 unique
### Shared Citations (cited by 2+ providers)
- https://example.com/paper — openai, gemini
Your job in synthesis: Read the structured matrix and explain the interesting patterns — which consensus findings are most significant, which unique-provider findings deserve follow-up, which majority findings have the strongest evidence. The matrix gives you the structure; you provide the interpretation.
Coverage levels (defined by word count in the section body):
Tell the user where the reports were saved and list the files.
| Keys Available | Behavior |
|---|---|
| 0 | Error with setup instructions |
| 1 | Single provider report, note limited comparison |
| 2 | Pairwise comparison |
| 3 | Full tri-model comparison |
Write a compact result summary so the parent session receives key findings:
cat > .claude/.skill-result.md << 'SKILLEOF'
## Deep Research Result: [Topic]
**Status:** [n]/3 providers succeeded | [list any failures]
**Time:** [total elapsed]s
**Output:** .claude/research/DeepResearch_[Topic]_[Date]/report.md
### Key Findings (highest confidence)
1. [Finding supported by 2+ providers]
2. [Finding supported by 2+ providers]
3. [Additional key finding]
### Needs Attention
- [Any provider failures or gaps worth noting]
SKILLEOF
Keep under 2000 characters. This is consumed by a hook — the parent session will see it automatically.
End with:
---
Deep Research complete — [n]/3 providers succeeded.
WARNING: [provider names] failed — [brief error reasons] (only include this line if any failed)
- Total research time: [sum of elapsed]s
- Report saved to: .claude/research/DeepResearch_[Topic]_[Date]/report.md
Want me to dig deeper into any specific finding?