Verify academic paper references — automated ground truth check + live arXiv verification + multi-agent cross-review protocol.
Verify all references in an academic paper for correctness. Three layers of defense.
Compare references.bib against paper/references_ground_truth.json — a human-verified, git-tracked file with correct author names, titles, years, and arXiv IDs.
uv run python scripts/verify_references.py # fast, no network
Runs in CI on every push. Exit code 1 on any mismatch. This catches:
Verify the ground truth itself against arXiv API. Run after adding new references or periodically.
uv run python scripts/verify_references.py --live # ~3s per reference
This catches errors in the ground truth file itself.
Spawn two independent agents that each verify all references from scratch. Neither sees the other's work.
Agent 1: "Verify all references in paper/references.bib against arXiv.
For each entry, search arXiv by title and report exact author names."
Agent 2: "Independently verify all references in paper/references.bib.
For each entry, fetch the arXiv page and compare authors."
Compare results. If agents disagree on any reference → flag for manual verification.
Why two agents: A single LLM can hallucinate that wrong names are correct (this happened — "Yundong Wang" was hallucinated as correct when the real author is "Yibin Wang"). Two independent verifications with different prompts make correlated hallucination unlikely.
paper/references.bibpaper/references_ground_truth.json with arxiv_iduv run python scripts/verify_references.py — must passuv run python scripts/verify_references.py --live — must pass{
"entries": {
"bibtex_key": {
"authors": ["Last, First", "Last, First"],
"title": "Full Paper Title",
"year": "2024",
"arxiv_id": "2406.11675"
}
}
}
Layer 1 output (automated):
Checking 11 references against ground truth...
OK: All references verified.
Layer 2 output (live):
Live-verifying 10 references against arXiv API (~30s)...
[blundell2015weight] arXiv:1505.05424 ... OK
[wang2024blob] arXiv:2406.11675 ... MISMATCH