MANDATORY for any code or document that makes medical, scientific, or literature-based claims. Enforces verification discipline for PubMed queries, gene naming, gap claims, and cross-database limitations. Use when: writing/editing PubMed scan code, updating papers/briefs with quantitative claims, adding new pathogens or genes, or reviewing gap analysis output.
This skill exists because sloppy query construction in March 2026 produced false medical claims that nearly destroyed project credibility. Every rule below was earned through a real bug that generated a false claim in a medical context.
Mpox OR bug (2026-03-04): "Mpox OR Monkeypox" was treated as a literal
phrase in PubMed, not boolean OR. Result: 0 papers found when 69 existed.
17 false gap claims published. Root cause: _name_clause() didn't exist;
raw string was wrapped in quotes.
Gene synonym bug (2026-03-04): Gene queries used only primary names (e.g., "inhA"). PubMed papers use expanded names ("enoyl-ACP reductase"). Result: TB showed 6 gene gaps when only 2 were real. Root cause: No synonym expansion; narrow queries = false negatives = false gap claims.
Duplicate gene inflation (2026-03-04): Ebola listed both "NP" and "nucleoprotein" as separate genes (same gene). RSV listed "N protein" + "nucleoprotein", "F protein" + "fusion protein", "G protein" + "attachment protein". MERS listed "spike" + "S protein". Result: Gap counts inflated by 7 (53→46 when fixed). Root cause: No deduplication check; copy-paste gene lists from different sources without reconciliation.
Before any PubMed scan code is merged, the query builder MUST have tests covering:
_name_clause() correctly handles single names, multi-word names, and OR-separated synonymslen(genes) == len(set(genes)))Test file: tests/test_pubmed_queries.py — run with python3 -m pytest tests/test_pubmed_queries.py -v
The correct phrasing is always:
"Zero results returned by our PubMed query [exact query string] as of [date]"
NEVER say:
Why: PubMed is one database. It misses:
When adding a gene to PATHOGEN_GENES:
GENE_SYNONYMSSynonym trap examples:
After running pubmed_scan.py, manually verify at least:
Use scripts/spot_check_claims.py or manual PubMed queries.
The single source of truth for gap counts is web/data/pubmed-scan-results.json.
When the scan runs, all documents MUST be updated from this file:
pubmed_scan.py → pubmed-scan-results.json → all papers/briefs
NEVER manually edit a number in a paper. Always re-run the scan and propagate.
After updating, grep for stale numbers:
grep -rn "OLD_NUMBER" publications/ docs/ --include="*.md"
Human GRCh38 is in the CRISPR target database for off-target screening. It is NOT a pathogen. It MUST be excluded from gap analysis. The scan script MUST filter out non-pathogen entries before generating claims.
Every paper MUST contain a limitation statement about PubMed's English-language bias. Template:
Limitation: English-language database bias. Our literature analysis queries PubMed exclusively, which has limited coverage of research published in Chinese (CNKI, Wanfang), Russian (eLIBRARY.ru), Arabic, and other non-English databases. Gap claims reflect PubMed coverage as of [date], not the totality of global CRISPR research. Researchers in non-English-speaking countries may have published relevant work not captured by our scan.
Use this checklist before publishing any claim:
□ Query string logged with exact PubMed syntax
□ Date of query recorded
□ OR-separated names produce boolean, not phrases
□ Gene synonyms expanded (checked against UniProt)
□ No duplicate genes in gene list
□ Non-pathogen entries excluded
□ At least 3 zero-result claims manually spot-checked
□ Phrasing says "our query returned zero" not "no research exists"
□ Non-English database limitation documented
□ All documents updated from single source (JSON)
□ grep confirms no stale numbers remain
□ Tests pass: pytest tests/test_pubmed_queries.py
If a false claim is discovered:
scripts/spot_check_claims.py on the specific claimpubmed_scan.pytests/test_pubmed_queries.pypython3 scripts/pubmed_scan.pybash scripts/build-preprints.shbash scripts/deploy-azure.sh --site-onlydocs/lessons/fix: prefix explaining the false claim