Data Pipeline Quality Check

Use when performing data pipeline quality check — template for assessing data pipeline quality, reliability, and data integrity. Covers schema validation, data freshness monitoring, completeness checks, anomaly detection, lineage tracking, and SLA compliance to ensure trustworthy data flows from source to consumption.

Beruf
Kategorien: Data Engineering

Data Pipeline Quality Check Skill

Assess data quality for pipeline {{ pipeline_name }} ({{ source_system }} -> {{ target_system }}).

Workflow

Phase 1 — Pipeline Overview

PIPELINE INVENTORY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[ ] Pipeline: {{ pipeline_name }}
[ ] Source: {{ source_system }}
[ ] Target: {{ target_system }}
[ ] Schedule: [ ] Real-time  [ ] Micro-batch (___ min)  [ ] Batch (___ daily)
[ ] Daily data volume: ___ GB / ___ records
[ ] Pipeline technology: ___
[ ] Last successful run: ___
[ ] SLA: data available within ___ of source event

Phase 2 — Data Completeness

COMPLETENESS CHECKS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[ ] Record count reconciliation:
    - Source records (24h): ___
    - Target records (24h): ___
    - Delta: ___ (___ %)
    - Acceptable threshold: ___ %
[ ] Partition completeness:
    - All expected partitions present: [ ] YES  [ ] NO
    - Missing partitions: ___
[ ] Late-arriving data handling:
    - Strategy: [ ] Reprocess  [ ] Append  [ ] Ignore
    - Late data window: ___
[ ] Null analysis:
    - Required fields with nulls: ___
    - Null rate per field within threshold: [ ] YES  [ ] NO

Data Pipeline Quality Check

Beruf
Kategorien: Data Engineering

Workflow

Phase 1 — Pipeline Overview

PIPELINE INVENTORY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [ ] Pipeline: {{ pipeline_name }} [ ] Source: {{ source_system }} [ ] Target: {{ target_system }} [ ] Schedule: [ ] Real-time [ ] Micro-batch (___ min) [ ] Batch (___ daily) [ ] Daily data volume: ___ GB / ___ records [ ] Pipeline technology: ___ [ ] Last successful run: ___ [ ] SLA: data available within ___ of source event

Phase 2 — Data Completeness

COMPLETENESS CHECKS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [ ] Record count reconciliation: - Source records (24h): ___ - Target records (24h): ___ - Delta: ___ (___ %) - Acceptable threshold: ___ % [ ] Partition completeness: - All expected partitions present: [ ] YES [ ] NO - Missing partitions: ___ [ ] Late-arriving data handling: - Strategy: [ ] Reprocess [ ] Append [ ] Ignore - Late data window: ___ [ ] Null analysis: - Required fields with nulls: ___ - Null rate per field within threshold: [ ] YES [ ] NO

Shortcut	Counter	Why
"We can skip some steps for this case"	Adapt the workflow steps, don't skip them	Skipped steps are where incidents and oversights originate
"The user seems to already know what to do"	Complete all workflow phases with the user	The workflow catches blind spots that experience alone misses
"This is a minor case, full process is overkill"	Scale the process down, don't turn it off	Minor cases become major when unstructured; the process scales, not disappears
"I'll fill in the details later"	Complete each section before moving on	Deferred details are forgotten; real-time capture is more accurate
"The template output isn't necessary"	Always produce the structured output format	Structured output enables comparison, audit trails, and handoff to other teams

Data Pipeline Quality Check

Data Pipeline Quality Check Skill

Workflow

Phase 1 — Pipeline Overview

Phase 2 — Data Completeness

Data Pipeline Quality Check

Data Pipeline Quality Check Skill

Workflow

Phase 1 — Pipeline Overview

Phase 2 — Data Completeness

Phase 3 — Data Accuracy

Phase 4 — Data Freshness

Phase 5 — Anomaly Detection

Phase 6 — Lineage and Documentation

Counter-Rationalizations

Output Format

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns