Name: Data Integrity Guard
Author: gretjia

Skill: data_integrity_guard

When To Use

Atomic writes: Always write to .tmp first, then os.rename() to final path. Never write directly to the final file — a crash mid-write corrupts the parquet.
Done markers: When a file is fully processed, create filename.parquet.done alongside it. This enables resume without reprocessing.
Fail ledger: Failed files are recorded in a fail_ledger.json. Check it before retrying — the error may be deterministic (pathological symbol, OOM).
Schema first: Before bulk processing, validate schema on the first file. Catch before burning hours.

Atomic writes: Always write to .tmp first, then os.rename() to final path. Never write directly to the final file — a crash mid-write corrupts the parquet.
Done markers: When a file is fully processed, create filename.parquet.done alongside it. This enables resume without reprocessing.
Fail ledger: Failed files are recorded in a fail_ledger.json. Check it before retrying — the error may be deterministic (pathological symbol, OOM).
Schema first: Before bulk processing, validate schema on the first file. Catch before burning hours.