Calibrated forecasting for real-world future events, plus prediction market checks. Use when the user wants the probability of a future outcome in politics, economics, technology, policy, or world events. Triggers: "what's the probability of X?", "will X happen?", "how likely is Y?", "what are the odds of X?", "check prediction markets", "what do Metaculus/Manifold/Polymarket say about", "give me calibrated odds". Best for binary or date-bounded questions about external events. Also searches Metaculus, Manifold, PredictIt, Polymarket, Kalshi, Betfair, and Smarkets for current market-implied probabilities. Do NOT use for sports betting odds, current asset prices, numeric time-series forecasting from CSV data (use forecast skill), or internal predictions like whether code, tests, or builds will pass.
Generate calibrated probabilistic forecasts on binary questions. Based on Halawi et al. (NeurIPS 2024) and the Metaculus forecasting-tools framework.
| Need | Action |
|---|---|
| Full calibrated forecast on a binary question | Follow the Pipeline below |
| Just check current prediction market odds | See check-odds.md |
For each question, execute these steps in order:
Restate the question as a precise binary (Yes/No) with:
If the user's question is vague, clarify before forecasting.
Run in parallel:
a) Market prior — See check-odds.md to get prediction market consensus. This is your initial anchor. If no markets exist, note that explicitly.
b) Web research — Use WebSearch for 3-5 targeted queries:
For each result, assess relevance (skip paywalled, error pages, stale content). Summarize each source as bullet points preserving dates, numbers, and quotes.
c) Base rate search — Search for the historical frequency of similar events. "How often has [category of event] happened in [relevant timeframe]?"
Write out your reasoning following this exact structure:
TIME REMAINING: [duration until resolution]
STATUS QUO: [what happens if nothing changes — this is the most likely outcome]
SCENARIO FOR NO:
- [concrete pathway to No]
- Strength: [weak/moderate/strong]
SCENARIO FOR YES:
- [concrete pathway to Yes]
- Strength: [weak/moderate/strong]
BASE RATE: [X% — historical frequency of similar events]
MARKET CONSENSUS: [X% from check-odds.md, or "no markets"]
KEY EVIDENCE:
- [evidence point 1 — for/against, strength]
- [evidence point 2 — for/against, strength]
- [evidence point 3 — for/against, strength]
INITIAL PROBABILITY: X%
Critical debiasing rules:
After your initial probability, challenge it:
Adjust if warranted. State what changed and why.
For important forecasts, get independent predictions from other models using discussion-partners:
Frame the question with ALL context (the partner has zero context). Include:
Ask: "You are a professional superforecaster interviewing for a job. Given this evidence, what is the probability of [question]? Walk through your reasoning, then output your final answer as 'Probability: ZZ%'."
Query 2-3 models. Aggregate via median (robust to outliers).
If your solo forecast and the multi-model median diverge by >15 percentage points, investigate why. The disagreement is informative — don't just average it away.
## Forecast: [question]
**Probability: X%**
### Evidence Summary
- [key evidence, 3-5 bullets]
### Reasoning
[2-3 sentences on the key drivers]
### Confidence Notes
- Market consensus: [X% or "no markets found"]
- Base rate: [X%]
- Multi-model range: [X%-Y%] (if step 5 was used)
- Key uncertainty: [what could most change this forecast]
### Sources
- [source 1]
- [source 2]
These come from Tetlock's superforecasting research and Metaculus tournament winners: