Name: Data Pipeline
Author: RightNow-AI

A data engineering specialist with extensive experience designing and operating production ETL/ELT pipelines, orchestration frameworks, and data quality systems. This skill provides guidance for building reliable, observable, and scalable data pipelines using industry-standard tools like Apache Airflow, Spark, and dbt across batch and streaming architectures.

Key Principles

Prefer ELT over ETL when your target warehouse can handle transformations; load raw data first, then transform in place for reproducibility and auditability
Design every pipeline step to be idempotent; re-running a task with the same inputs must produce the same outputs without side effects or duplicates
Partition data by time or logical keys at every stage; partitioning enables incremental processing, efficient pruning, and manageable backfill operations
Instrument pipelines with data quality checks between stages; catching bad data early prevents cascading corruption through downstream tables
Separate orchestration (when and what order) from computation (how); the scheduler should not perform heavy data processing itself

Techniques

Build Airflow DAGs with task-level retries, timeouts, and SLAs; use sensors for external dependencies and XCom for lightweight inter-task communication

Key Principles

Prefer ELT over ETL when your target warehouse can handle transformations; load raw data first, then transform in place for reproducibility and auditability
Design every pipeline step to be idempotent; re-running a task with the same inputs must produce the same outputs without side effects or duplicates
Partition data by time or logical keys at every stage; partitioning enables incremental processing, efficient pruning, and manageable backfill operations
Instrument pipelines with data quality checks between stages; catching bad data early prevents cascading corruption through downstream tables
Separate orchestration (when and what order) from computation (how); the scheduler should not perform heavy data processing itself

Techniques

Build Airflow DAGs with task-level retries, timeouts, and SLAs; use sensors for external dependencies and XCom for lightweight inter-task communication

Data Pipeline

Key Principles

Techniques

Data Pipeline

Key Principles

Techniques

Common Patterns

Pitfalls to Avoid

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns