Systematic audit of academic manuscript references: authenticity verification, bibliographic accuracy, citation appropriateness, and software/data version consistency. Triggers on: citation audit, reference check, bibliography verification, fabricated/fake/hallucinated reference detection, DOI verification, pre-submission check, manuscript review, R/Python package version consistency, data source citation, checking if a paper is real, 文献审查, 引用审查, 参考文献检查, 伪造文献, 投稿前检查, DOI核对, 软件版本核对, 数据源引用. Applicable file types: .docx, .tex, .bib, .ris, .enl, .nbib, manuscript files.
Systematic audit of all references in an academic manuscript before submission. 投稿前对学术稿件参考文献进行系统性全面审查。
| Level | Scope / 范围 | Severity / 严重度 |
|---|---|---|
| L1 | Authenticity — does the paper exist? Is the DOI correct? / 真实性——论文是否存在?DOI 是否正确? | 🔴 Fatal |
| L2 | Bibliographic accuracy — authors, year, volume, pages, journal / 书目信息——作者、年份、卷号、页码、期刊 | 🔴 Critical |
| L3 | Text–list consistency — every in-text citation has a matching entry and vice versa / 正文与列表一致性 | 🟡 Important |
| L4 | Citation appropriateness — each citation supports the claim it is attached to / 引用恰当性 | 🟡 Improvement |
| L5 | Formatting & version consistency — style uniformity, software/data versions match actual usage / 格式与版本一致性 | ⚪ Housekeeping |
Extract all text with paragraph indices for cross-referencing. See scripts/extract_docx.py.
For .tex files, parse directly. For .docx, use the python-docx library. Separate the reference list from the body text and index each entry.
[!CAUTION] AI-assisted writing frequently introduces "hallucinated" references — DOIs that resolve to unrelated papers, or entirely fabricated entries. This is the most severe error class.
AI 辅助写作极易引入"幻觉文献"。此类错误一旦发表后果严重。
Method / 方法: CrossRef API + web search dual verification
scripts/crossref_batch_check.py to batch-query CrossRef API metadata.Red flags for fabricated references / 伪造文献特征:
Verification chain for suspicious entries / 可疑条目验证链:
Check every entry against its verified source for:
| Field | Common errors / 常见错误 |
|---|---|
| Authors | Missing co-authors (especially 4th+), wrong initials (G.H. vs C.H.) / 遗漏合著者、名缩写错误 |
| Year | Early Online vs. official publication date confusion / 在线优先与正式出版日期混淆 |
| Journal | Abbreviated vs. full name inconsistency / 缩写不统一 |
| Volume/Pages | Mismatch with DOI record / 与 DOI 记录不符 |
| DOI | Placeholder not replaced (e.g. zenodo.XXXXXXX), points to wrong article / 占位符未替换 |
(Author, Year) and (Author et al., Year) citations from the body text.Evaluate each citation:
[!IMPORTANT] The manuscript MUST report the actual software versions used for the analysis, not the latest CRAN/PyPI versions.
R environment:
pkgs <- c('ecospat', 'biomod2', 'terra', 'sf')
for (p in pkgs) cat(sprintf("%-12s %s\n", p, packageVersion(p)))
cat(sprintf("%-12s %s\n", "R", R.version.string))
Python environment:
import pkg_resources, sys
for p in ['numpy', 'pandas', 'scikit-learn', 'tensorflow']:
try: print(f"{p:20s} {pkg_resources.get_distribution(p).version}")
except: print(f"{p:20s} NOT INSTALLED")
print(f"{'Python':20s} {sys.version.split()[0]}")
Other environments (Julia, MATLAB, etc.): adapt the pattern to query installed package versions.
Cross-check steps:
library() / import / using calls.Verify that every external data source used in the analysis is properly cited:
| Data type | Examples | What to check |
|---|---|---|
| Remote sensing | MODIS, Landsat, Sentinel | Product name, version, DOI or data center URL |
| Climate data | WorldClim, CHELSA, ERA5 | Version number, resolution, temporal coverage |
| Biodiversity records | GBIF, iNaturalist, VertNet | Download DOI, access date, query parameters |
| Geospatial layers | Natural Earth, GADM, OpenStreetMap | Version, access date |
| Genomic data | GenBank, SRA, ENA | Accession numbers |
| Statistical databases | World Bank, UN, national bureaus | Dataset name, access date, URL |
Common issues:
Generate a citation_audit.md report structured as:
# Citation Audit Report / 参考文献审查报告
## 🔴 Must-fix errors / 必须修正
(Ordered: fabricated > missing > bibliographic)
## 🟡 Recommended improvements / 建议改进
(Appropriateness, formatting)
## ✅ Verified entries / 已验证通过
(Full checklist with per-entry status)
Never trust CrossRef alone — its "best match" is frequently wrong for books, chapters, datasets, non-English literature, and same-surname authors. Always web-search verify. CrossRef 返回的"最佳匹配"经常是错误的,必须用 Web 搜索二次验证。
Year discrepancies need judgment — "Early Online" vs. print dates can differ by 1–2 years; both are acceptable. Differences > 2 years likely indicate a real error. 年份差异需判断:Early Online 与正式出版差 1–2 年属正常。
Methods must match code — if the manuscript claims package X was used but the scripts call package Y, this is a reviewable error. Cross-check Methods section against actual scripts line by line. 稿件方法描述必须与代码一致,需逐行比对。
Data sources need citations too — remote sensing products, climate databases, and biodiversity data portals all require proper citation with DOI/version/access date. 数据源也需要规范引用。
| Don't / 不要 | Do instead / 应该 |
|---|---|
| Trust CrossRef blindly | CrossRef + web search dual verification |
| Ignore recent publications | Extra scrutiny for papers < 2 years old |
| Assume all DOIs are correct | Resolve every DOI and verify the target |
| Only check the reference list | Also cross-check body citations and code |
| Report everything at once | Triage by severity: fatal → critical → improvement |
| Skip data source citations | Verify every dataset, layer, and product is cited |