Guard rails for data pipeline safety, schema contracts, and atomic writes.
.tmp first, then os.rename() to final path. Never write directly to the final file — a crash mid-write corrupts the parquet.filename.parquet.done alongside it. This enables resume without reprocessing.fail_ledger.json. Check it before retrying — the error may be deterministic (pathological symbol, OOM).ColumnNotFoundErrorStage 1 output (base L1):
Required: symbol, date, time_end, close, volume, bid_v1, ask_v1, ...
Stage 2 output (features L2):
Required: all Stage 1 columns + srl_resid, implied_y, topo_micro, topo_classic,
epiplexity, effective_depth, spoof_ratio, ...
# Quick check: how many files processed?
python3 tools/cluster_health.py --quick
# Manual check on Linux:
ls /omega_pool/parquet_data/v62_feature_l2/host=linux1/*.parquet.done | wc -l
ls /omega_pool/parquet_data/v62_base_l1/host=linux1/*.parquet | wc -l
lob_flux (bid/ask volume deltas), always isolate by symbol to prevent phantom values at stock boundaries