Evaluating the quality, provenance, and relevance of evidence that supports or undermines a claim. Covers source credibility, sampling quality, study design, levels of evidence (anecdote to meta-analysis), base rate integration, distinguishing primary from secondary sources, and calibrating belief to evidence strength. Use when the question is not whether an argument is valid but whether its premises are actually supported by the available data.
Valid reasoning from false premises proves nothing. Evidence assessment is the discipline of testing whether the premises of an argument are actually supported by the data, what kind of data would be decisive, and how much confidence the available evidence warrants. This skill covers the evaluation of sources, study designs, sampling, levels of evidence, and the integration of new evidence with prior knowledge.
Agent affinity: elder (structural reconstruction of evidence claims), tversky (base rates, inductive strength), kahneman-ct (evidence evaluation under System 1/2)
Concept IDs: crit-evidence-quality, crit-sourcing, crit-burden-of-proof, crit-scientific-literacy
| # | Operation | Question answered |
|---|---|---|
| 1 | Source identification | Where did this claim come from? |
| 2 | Primary vs. secondary | Is this the original source or a report of it? |
| 3 | Source credibility | Does the source have relevant expertise and a track record? |
| 4 | Funding and conflicts | Who paid for this? What do they stand to gain? |
| 5 | Sample quality | How were the data collected and from whom? |
| 6 | Study design | Observational? Experimental? Randomized? |
| 7 | Levels of evidence | Where does this study sit in the evidence hierarchy? |
| 8 | Reproducibility | Has this result been replicated? |
| 9 | Scope check | What does the evidence actually show vs. what is being claimed? |
| 10 | Base rate integration | How does this update against prior probability? |
Not all evidence is created equal. Roughly from weakest to strongest:
| Level | Type | Strength | Examples |
|---|---|---|---|
| 1 | Anecdote / testimonial | Very weak | "My cousin took X and felt better" |
| 2 | Expert opinion | Weak (unless grounded) | An authority says it's true |
| 3 | Case report | Weak | Single clinical observation |
| 4 | Case series | Weak | A handful of similar cases reported together |
| 5 | Cross-sectional / correlational study | Moderate | Survey data showing X and Y co-occur |
| 6 | Case-control study | Moderate | Retrospective comparison of cases to controls |
| 7 | Cohort study (prospective) | Moderate-strong | Follow a population forward over time |
| 8 | Randomized controlled trial (single) | Strong | Random assignment, controlled conditions |
| 9 | Meta-analysis / systematic review | Strongest | Pooled data from multiple RCTs with appropriate weighting |
Key discipline. The level of evidence should match the confidence of the claim. A single anecdote may be worth noting but cannot support "X causes Y." A meta-analysis can support stronger claims.
Pattern: Trace the claim to its origin. Who first said it? Where was it published? When?
Worked example. A tweet says, "Studies show that 8 glasses of water a day improves cognitive performance by 23%." The 23% is suspiciously precise. Trace it: the tweet cites a blog post, which cites a news article, which mentions "a study" without linking. Go find the actual study. Often, it does not exist, or the actual finding is much weaker ("effects on a specific subtest of working memory in a 15-person study").
Discipline. Refuse to cite or propagate a claim whose original source you have not seen.
Primary source. The original document — the research paper, the dataset, the eyewitness account, the legal ruling, the historical artifact.
Secondary source. A report, summary, interpretation, or citation of the primary source. Textbooks, news articles, review papers, and blog posts are usually secondary.
Why it matters. Secondary sources introduce errors. A claim filtered through three secondary sources typically loses nuance, acquires hedges or loses them, and drifts from the original. For high-stakes claims, go to the primary.
Assess credibility along multiple dimensions:
Common mistake. Treating credentials as a substitute for evaluation. An expert in one field is not automatically credible in another. A Nobel laureate in physics speaking about nutrition should be evaluated on the same terms as anyone else speaking about nutrition.
A source's funding or affiliations do not automatically invalidate their claims, but they do raise the bar for independent verification. Studies funded by industries with a stake in the outcome show systematically different results on average.
Worked example. A study funded by a food industry group finds no evidence linking its product to a health outcome. This does not prove the study wrong, but it raises the question of whether an independently funded replication would find the same result.
Discipline. Always note funding sources. Treat results from conflicted sources as weaker evidence than equivalent results from independent sources. Ask for replication in independent labs before accepting policy-relevant claims.
For claims based on samples (surveys, studies, polls):
| Question | What to check |
|---|---|
| Sample size | Larger is better, with diminishing returns; 1,500 for population surveys is typical |
| Randomness | Was the sample drawn randomly from the target population? |
| Representativeness | Does the sample match the population in age, gender, region, etc.? |
| Self-selection | Did respondents choose to participate? (Major bias source) |
| Response rate | What fraction of those contacted actually responded? |
| Attrition | How many dropped out by the end of a longitudinal study? |
Worked example. "In our online poll, 82% of respondents support policy X." Online polls are not random samples. Respondents self-select. This result is close to uninformative about the general population and should never be cited as "public opinion."
Not all studies answer the same kind of question with the same strength.
Key insight. The phrase "studies show" does most of its rhetorical work by hiding the study design. A news story claiming "studies show X" may be reporting a single observational study with severe confounds. Always ask: what was the study design?
Compare what the evidence shows to what is being claimed. The most common overreach patterns:
Worked example. A study finds that a chemical kills cancer cells in a petri dish. The headline reads "New cancer cure discovered." Nothing in the study shows the chemical works in a living human being, at a safe dose, delivered through a feasible route. The scope of the claim has overreached the evidence by several levels.
A single study is weak evidence. A finding that has been independently replicated in different labs, with different samples, using different methods, is much stronger.
Any new evidence must be integrated with prior probability. The same piece of evidence can mean very different things depending on the base rate.
Worked example (medical testing). A test for a rare disease has 99% sensitivity and 95% specificity. A patient tests positive. What's the probability they have the disease?
If the disease affects 1 in 1,000 people:
If the disease affects 1 in 10 people:
Same test, wildly different interpretations. Base rate matters.
The burden of proof lies with whoever is making a claim, not with whoever is questioning it. Extraordinary claims require extraordinary evidence (the Sagan standard).
Common mistake. Shifting the burden. "Prove that it doesn't work" is not a valid response to "it has not been shown to work." The burden remains on the proponent.
When evaluating a claim in the wild:
logical-reasoning| Mistake | Why it fails | Fix |
|---|---|---|
| "Studies show" without citation | No verifiable source | Refuse to rely on claims you cannot trace |
| Treating a single study as final | Single studies often do not replicate | Wait for or look for replications |
| Ignoring conflicts of interest | Funding biases results on average | Note the conflict; weight the evidence accordingly |
| Overreaching scope | Premise supports less than the conclusion claims | Restrict the claim to what the evidence actually shows |
| Neglecting base rates | Evidence value depends on prior probability | Always ask: how common is this really? |
| Cherry-picking studies | One favorable study does not reflect the literature | Seek systematic reviews or meta-analyses |