Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
When recording a new prediction, capture:
When evaluating predictions, assign one of:
verifiedClearly came true as stated.
falsifiedClearly did not come true.
partially-verifiedPartially accurate.
too-earlyNot enough time has passed.
unfalsifiableCannot be objectively assessed.
ambiguousPrediction was too vague to evaluate.
For each prediction being evaluated:
What exactly was claimed?
Has enough time passed to evaluate?
What has happened since?
Which evaluation status applies?
If verifiable, rate 0.0-1.0:
What does this tell us about:
For evaluation:
{
"evaluations": [
{
"predictionId": "id",
"status": "verified",
"accuracyScore": 0.85,
"evidence": "Description of evidence",
"notes": "Additional context",
"evaluatedAt": "timestamp"
}
]
}
For accuracy statistics:
{
"author": "Author name",
"totalPredictions": 15,
"verified": 5,
"falsified": 3,
"partiallyVerified": 2,
"pending": 4,
"unfalsifiable": 1,
"averageAccuracy": 0.62,
"topicBreakdown": {
"reasoning": { "predictions": 5, "accuracy": 0.7 },
"agents": { "predictions": 3, "accuracy": 0.4 }
},
"calibration": "Assessment of how well-calibrated they are"
}
Evaluate whether predictors are well-calibrated:
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes |
|---|---|---|---|---|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
Watch for prediction patterns that suggest bias: