A scientific thinking framework that treats every piece of information (user requests, agent responses, feedback, experiment interpretations) as a hypothesis to be verified through falsification-first experimentation. Use when asked to investigate, verify, debug, validate, test claims, test assumptions, analyze root causes, or when the task involves uncertainty, competing explanations, or non-obvious causality.
A framework for treating all information exchange as hypotheses and verifying them through scientific experimental design.
Nothing is ground truth. Every piece of information — regardless of its source — is a hypothesis until supported by converging evidence that has survived falsification attempts.
This principle applies to the framework itself. The claim that "falsification-first is the best approach for this task" is an L2 hypothesis. If evidence suggests a different epistemology (e.g., Bayesian updating, heuristic satisficing) would serve the current task better, revise the approach. See Limitations below.
| Layer | Source | Example |
|---|---|---|
| L1 | User request | "This bug is a cache issue" -> H: cache is the root cause |
| L2 | Agent response | "Logs suggest a DB timeout" -> H: DB timeout is the cause |
| L3 | User feedback | "No, it's a network problem" -> H: network is the cause |
| L4 | Experiment interpretation | "Ping test passed -> network is fine" -> H: ping test is the right verification method for this problem |
| L5 | External knowledge | "Docs say this library has a known race condition" -> H: this known issue applies to our specific usage and version |
Confidence comes from accumulated evidence, not authority. A statement is not true because the user said it, the agent concluded it, or a test passed. Evidence converges when 2+ independent sources or methods point to the same conclusion.
Apply this framework when:
Use proportional rigor (lighter application) when:
Collect the current state before forming any hypothesis.
Extract testable hypotheses from all inputs — user requests, your own reasoning, user feedback, and prior experiment results.
For each hypothesis, define:
A hypothesis without falsification criteria is not a hypothesis. "Something might be wrong" is not testable. "The cache returns stale data before TTL expiry" is testable because you can define what observation would disprove it.
When possible, formulate rival hypotheses — multiple competing explanations for the same observation. A failed prediction may refute the hypothesis or an unstated auxiliary assumption (the Duhem-Quine