Name: Proof Engine
Author: yaniv-golan

Proof Engine

Create formal, verifiable proofs of claims with machine-checkable reasoning. Use when asked to prove, verify, fact-check, or rigorously establish whether a claim is true or false — mathematical, empirical, or mixed. Trigger phrases: "is it really true", "can you prove", "verify this", "fact-check this", "prove it", "show me the logic". Do NOT use for opinions, essays, or questions with no verifiable answer.

yaniv-golan0 estrellas7 abr 2026

Ocupación
Categorías: Educación

LLMs hallucinate facts and make reasoning errors. This skill overcomes both by offloading all verification to code and citations. Every fact is either computed by Python code anyone can re-run (Type A) or backed by a specific source, URL, and exact quote (Type B).

Produces three outputs: a re-runnable proof.py script, a reader-facing proof.md, and a proof_audit.md with full verification details.

Gotchas

These are the highest-value lessons from field testing. Read before writing any proof code.

Don't inline verification logic: Import the bundled scripts. Rewriting normalize_text() inline risks garbling the HTML-stripping logic.
Don't use int() truncation as a cross-check: int(days / 365.2425) == calendar_years is not independent — both are functions of the same input.
Don't restate the proof as an adversarial check: "70 years after 1948 is 2018, and 2026 > 2018" catches nothing. Search for counter-evidence.
Handle Unicode in citations: Real web pages use en-dashes, curly quotes, ring-above vs degree, non-breaking spaces. handles this automatically.

Proof Engine

yaniv-golan0 estrellas7 abr 2026

Ocupación
Categorías: Educación

Gotchas

These are the highest-value lessons from field testing. Read before writing any proof code.

Don't inline verification logic: Import the bundled scripts. Rewriting normalize_text() inline risks garbling the HTML-stripping logic.

Don't use int() truncation as a cross-check: int(days / 365.2425) == calendar_years is not independent — both are functions of the same input.

Don't restate the proof as an adversarial check: "70 years after 1948 is 2018, and 2026 > 2018" catches nothing. Search for counter-evidence.

Handle Unicode in citations: Real web pages use en-dashes, curly quotes, ring-above vs degree, non-breaking spaces. handles this automatically.

File	Read when
hardening-rules.md	Step 3 — the 7 rules with bad/good examples
proof-templates.md	Step 3 — read this index to choose a template, then read the specific template file it directs you to
output-specs.md	Step 5 — proof.md and proof_audit.md structure
self-critique-checklist.md	Step 6 — before presenting results
advanced-patterns.md	When encountering complex quotes or table-sourced data
environment-and-sources.md	When facing fetch failures, paywalls, or .gov 403s

Script	Purpose	Key functions
`scripts/extract_values.py`	Parse values FROM quote strings (Rule 1)	`parse_date_from_quote()`, `parse_number_from_quote()`, `parse_percentage_from_quote()`
`scripts/smart_extract.py`	Unicode normalization + extraction utilities	`normalize_unicode()`, `verify_extraction()`, `diagnose_mismatch()`
`scripts/verify_citations.py`	Fetch URLs, verify quotes (Rule 2)	`verify_citation()`, `verify_all_citations()`, `build_citation_detail()`, `verify_data_values()`
`scripts/computations.py`	Verified constants, formulas, self-documenting output (Rule 7)	`compute_age()`, `compare()`, `explain_calc()`, `cross_check()`, `compute_percentage_change()`
`scripts/source_credibility.py`	Domain credibility from URL (offline). Called automatically by `verify_all_citations()`.	`assess_credibility(url)`
`scripts/validate_proof.py`	Static analysis for rule compliance	`ProofValidator(filepath).validate()`

Rule	Closes failure mode	Enforced by
1. Never hand-type values	LLM misreads dates/numbers from quotes	`scripts/extract_values.py`
2. Verify citations by fetching	Fabricated quotes/URLs	`scripts/verify_citations.py`
3. Anchor to system time	LLM wrong about today's date	`date.today()`
4. Explicit claim interpretation	Silent ambiguity	`CLAIM_FORMAL` dict with `operator_note`
5. Independent adversarial check	Confirmation bias	Counter-evidence web searches
6. Independent cross-checks	Shared-variable bugs	Multiple sources parsed separately
7. Never hard-code constants/formulas	LLM misremembers values	`scripts/computations.py`

Verdict	Meaning
PROVED	All facts verified, logic valid, conclusion follows
PROVED (with unverified citations)	Logic valid but some citation URLs couldn't be fetched
SUPPORTED	Absence-of-evidence threshold met, no counter-evidence found
SUPPORTED (with unverified citations)	Absence threshold met but corroborating citations couldn't be fetched
DISPROVED	Verified counterexample or contradiction found
DISPROVED (with unverified citations)	Counterexample found but some citations couldn't be fetched
PARTIALLY VERIFIED	Some sub-claims met threshold, others did not — Conclusion states whether each failing SC lacked evidence or was contradicted
UNDETERMINED	Insufficient evidence either way

Proof Engine

Gotchas

Proof Engine

Gotchas

Reference Files

Bundled Scripts

Environment

Core Concepts

The 7 Hardening Rules

Workflow

Step 1: Analyze the Claim

Step 2: Gather Facts (Both Directions)

Step 3: Write the Proof Code

Step 4: Validate

Step 5: Execute and Report

Step 6: Self-Critique

Verdicts

Limitations

Edge Cases

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System