Adjust aggregated probability forecasts upward when diverse forecasters converge, compensating for crowd conservatism
Extremizing is a statistical technique from Philip Tetlock's superforecasting research that compensates for the conservatism bias in aggregated probability forecasts. When multiple independent forecasters converge on similar probabilities, their collective wisdom is typically underconfident - if everyone says 66%, the real probability is likely higher. Extremizing pushes aggregate forecasts away from 50% toward the extremes (0% or 100%) based on forecaster diversity and agreement levels.
Developed through the Good Judgment Project (2011-2015), extremizing uses a log-odds algorithm to systematically adjust crowd predictions. The technique doesn't apply equally to all forecasts - it's most powerful when applied to diverse crowds with information asymmetry, and least necessary for expert "superforecaster" teams who already share common knowledge.
Gather probability estimates from multiple forecasters on the same question. Ensure independence - forecasters shouldn't coordinate before submitting.
Example question: "Will Company X's stock price exceed $150 by Dec 31?"
Raw forecasts from 50 forecasters:
Average: 67%
Compute the baseline aggregate using mean, median, or weighted average (weight by past accuracy).
Weighted by Brier score performance:
Evaluate information diversity using two signals:
Diversity score:
Use log-odds transformation to push forecast away from 50%:
Log-odds formula:
Example with 68% aggregate, extremizing factor 1.3:
Result: Original 68% becomes 73% after extremizing.
Adjust extremizing factor based on:
Good Judgment Project findings:
Track extremized forecasts vs. raw aggregates using Brier scores (lower is better).
Brier score formula: Average of (forecast - outcome)²
Example results:
Over-extremizing superforecasters - Elite teams with shared knowledge don't benefit from extremizing. They're already at optimal confidence levels.
Extremizing small samples - Need 20+ forecasters for statistical validity. With 5 forecasters, extremizing adds noise.
Ignoring herding - If forecasters see each other's predictions, they're not independent. Extremizing amplifies groupthink.
Fixed extremizing factor - Optimal factor varies by question type, forecaster pool, time horizon. Test and calibrate.
Extremizing outliers - Remove statistical outliers (>3 standard deviations) before extremizing, or they'll distort the adjustment.
Good Judgment Project (2011-2015): Extremizing regular forecaster teams boosted them past some superforecaster teams in IARPA tournament accuracy rankings.
Prediction markets alternative: Tetlock's team showed extremized prediction polls outperformed prediction markets when using temporal decay, differential weighting, and recalibration.
Intelligence community: IARPA adopted extremizing for aggregating analyst forecasts on geopolitical events.
Financial markets: Hedge funds apply extremizing to analyst consensus estimates when dispersion is low but conviction is high.
Extremizing works because crowds are systematically underconfident. When diverse forecasters independently arrive at 70%, they're hedging uncertainty by staying closer to 50%. But their convergence is itself a signal - if people with different information reach similar conclusions, truth is likely more extreme.
The technique only applies when forecasters have information asymmetry. A team with zero diversity (clones who know everything each other knows) should never be extremized - they're already optimally calibrated. Superforecaster teams approach this ideal, which is why extremizing doesn't help them much.
When extremizing works best: Apply to mass forecasts from diverse crowds. Extremizing brings regular crowds almost to parity with superforecaster accuracy in many cases.
When to skip extremizing: Superforecaster teams, small samples, herding/coordination, purely random forecasts.