Validate the integrity of a Delta Lake table — check for missing files, broken log entries, schema inconsistencies, and common issues. Use when the user suspects something is wrong with a Delta table or wants a health check. Triggers on phrases like "validate table", "check delta table", "is this table healthy", "something wrong with my table", "debug delta table", "pipeline failed".
Validate the integrity of a Delta Lake table by checking its transaction log and physical files. Surfaces common issues without requiring any Delta library.
/delta-skills:validate path/to/table
TABLE_PATH="$1"
# Check _delta_log exists
[ -d "$TABLE_PATH/_delta_log" ] || { echo "FAIL: No _delta_log directory"; exit 1; }
# Check there are commit files
COMMIT_COUNT=$(ls "$TABLE_PATH/_delta_log/"*.json 2>/dev/null | wc -l)
[ "$COMMIT_COUNT" -gt 0 ] || { echo "FAIL: No commit files in _delta_log"; exit 1; }
echo "✓ _delta_log exists with $COMMIT_COUNT commit files"
List all JSON commit files and check that version numbers are sequential with no gaps:
ls "$TABLE_PATH/_delta_log/"*.json | sort -V
Expected: 00000000000000000000.json, 00000000000000000001.json, etc.
If _last_checkpoint exists, verify:
.checkpoint.parquet fileif [ -f "$TABLE_PATH/_delta_log/_last_checkpoint" ]; then
cat "$TABLE_PATH/_delta_log/_last_checkpoint"
# Verify the referenced checkpoint file exists
fi
Scan all metaData actions across the log:
metaData occurrenceFlag:
int → string) — breaking changeRead the last 5 commit files and check:
commitInfo as the first lineFor the 5 most recently added Parquet files (from recent add actions), check they exist on disk:
# For each recent add action path
ls "$TABLE_PATH/path/to/part-file.parquet" 2>/dev/null || echo "MISSING: ..."
If files are missing → the table has lost physical data. This is serious.
If any add actions reference deletionVector, note that deletion vectors are in use. Check that the referenced .bin files exist alongside their Parquet files.
Check the last 20 commits for:
Delta table validation: tables/customers
─────────────────────────────────────────────────────
Structure
✓ _delta_log exists
✓ 42 commit files found (v0 → v41)
✓ No gaps in version sequence
✓ Checkpoint at v40, file present
Schema
✓ Schema consistent across all versions
ℹ Schema changes detected:
v0: original schema (4 columns)
v33: added column loyalty_tier (string) — safe, additive
Recent commits (last 5)
✓ All commit files valid and non-empty
✓ Timestamps increasing
✓ No duplicate versions
Physical files (spot check, last 5 added files)
✓ All 5 sampled files present on disk
Operational health
ℹ Last OPTIMIZE: v40 (2025-03-16) — 2 days ago, looks healthy
⚠ Last VACUUM: v36 (2025-03-12) — 6 days ago, approaching 7-day default retention
ℹ Deletion vectors: not in use
─────────────────────────────────────────────────────
Summary: 1 warning, 0 errors
Recommendation: Run VACUUM soon to clean up deleted files before they accumulate.
✓ — healthyℹ — informational, no action needed⚠ — warning, action recommended✗ — error, table may be broken or data may be at riskMissing commit files (gaps in sequence) The transaction log is corrupted. Some versions are unreadable. Do not write to this table until investigated.
Missing physical Parquet files Data has been lost. Files referenced in the log no longer exist. This usually means VACUUM ran with too short a retention period, or files were deleted manually.
Timestamps going backwards Multiple writers with clock skew, or a commit was manually edited. Usually harmless but worth noting.
Very large number of small files Table has never been OPTIMIZE'd or OPTIMIZE runs infrequently. Query performance will degrade.
Checkpoint file missing
_last_checkpoint points to a checkpoint that doesn't exist. Delta readers will fall back to replaying from v0, which will be slow for large tables.