Design, build, debug, and harden ETL and ELT pipelines so data moves correctly, repeatably, and observably between systems. Use when work involves extraction strategy, schema mapping, transformation logic, CDC, incremental loads, deduplication, orchestration, backfills, reconciliation, schema drift handling, or pipeline recovery. Do not use for dashboard interpretation, pure warehouse BI modeling, or infrastructure administration unrelated to data movement and pipeline operations.
Move data without creating silent debt.
This skill is for operational data movement: extracting from source systems, transforming records into reliable structures, loading them into destinations safely, and making the whole path debuggable when sources change, jobs fail, or reprocessing is required.
Clarify:
If source semantics are ambiguous, stop pretending the mapping is obvious. Define the contract first.
Useful outputs include:
Document what each source record represents, what counts as an update, how deletes appear, which fields are stable, and what timestamps or versions can actually be trusted.
A good pipeline tolerates reruns, retries, and partial failure. State how reprocessing works, how duplicates are prevented, and how failed batches are resumed or replaced.
Keep extraction, staging, transformation, and serving responsibilities understandable. Clean boundaries make schema drift, bad inputs, and load failures easier to isolate.
Specify:
Incremental pipelines fail quietly when these rules are hand-waved.
At minimum consider row-count shifts, null spikes, duplicate rates, freshness, referential breaks, reconciliation totals, schema drift, and source-to-target parity checks where possible.
State the blast radius, cost, partitioning strategy, throttling approach, verification method, and rollback or replacement plan. Backfills are where fragile pipelines get exposed.
Name transformations clearly, keep assumptions visible, reduce hidden state, and leave an operator enough information to debug the job at 3 a.m.
Prefer:
Avoid:
Check:
A strong result should:
prompt.md — engineering stance and response patternexamples/README.md — deliverable shapes for design, diagnosis, and recoveryguides/qa-checklist.md — final operational review checklistmeta/skill.json — catalog metadata and adjacent-skill map