Skip to content

Skills suchen.../

Agent Skill Search Engine

Suche

Suche
Kategorien
Berufe

About

About
Privacy
Terms

© 2026 Skills Pool. Alle Rechte vorbehalten.

Data Import Parsers | Skills Pool

Skill-Datei

Data Import Parsers

Implement or refactor data import/parsing so it streams files sequentially (memory-safe), validates/coerces types explicitly, skips irreparable records while logging them to an error CSV (full original columns + timestamp/file/line/error), emits rows_ok/rows_skipped/parse_errors metrics, and guarantees idempotent DB writes.

janjaszczak0 Sterne17.01.2026

Beruf
Kategorien: Data Engineering

Skill-Inhalt

Data Import & Validation (Streaming + Error CSV + Metrics + Idempotency)

When to use this skill

Use when working on:

ETL / import pipelines (CSV/XLSX/JSON) into a DB
Parsers that currently load whole files into memory
Data validation/coercion, error isolation, auditability, and reproducibility
Any import job that must be safe to re-run (idempotent)

Non-negotiables (contract)

Process input files sequentially (no “load everything then insert”) to control memory.
Validate and coerce types explicitly (define allowed coercions; reject ambiguous cases).
Irreparable records must be skipped and logged to an error CSV containing:
- all original columns exactly as seen in input
- extra columns: timestamp, file, line,

Verwandte Skills

Schnellinstallation

Data Import Parsers

npx skillvault add janjaszczak/janjaszczak-cursor-skills-data-import-parsers-skill-md

Skill herunterladen Repository öffnen

Autor: janjaszczak
Sterne: 0
Aktualisiert: 17.01.2026
Beruf

Auf dieser Seite

01Data Import & Validation (Streaming + Error CSV + Metrics + Idempotency)

error

Emit metrics at minimum: rows_ok, rows_skipped, parse_errors.

DB writes must be idempotent: re-running the import must not duplicate or corrupt data.

Standard workflow

1) Plan-first (before coding)

Identify input formats, volume, and “row identity” rules (keys/dedup strategy).
Locate current import entrypoints + DB write layer.
Define:
- schema mapping (source -> target columns)
- validation rules per field
- coercion rules per field (what is allowed, what is not)
- error taxonomy (what counts as “irreparable”)

2) Streaming architecture

Read one file at a time, row by row (or chunked) and write in bounded batches.
Never accumulate full datasets in memory.
Ensure progress logging is monotonic (e.g., file + row counters).

3) Validation & coercion

Treat raw row values as immutable “source of truth”.
Perform coercions in a controlled layer:
- return (ok, parsed_record) or (error, reason) per row
Separate concerns:
- parsing (raw -> typed)
- business validation (typed -> acceptable)
- persistence (acceptable -> DB)

4) Error CSV logging

On any irreparable row:
- write original row values (unmodified)
- add: timestamp, file, line, error
Avoid partial writes that drop context; every skipped row must be explainable from the CSV alone.

5) Metrics

Maintain counters:

rows_ok: successfully persisted rows
rows_skipped: irreparable rows skipped
parse_errors: count of parse/validation errors (can equal rows_skipped or be a superset if you track recoverable warnings separately)

Emit metrics:

end-of-file summary
end-of-run summary (aggregate over files) Optionally persist metrics to a JSON or a DB table for observability.

6) Idempotent persistence

Pick and implement one clear strategy:

Upsert by natural key / business key
Insert with unique constraint + conflict handling
Staging table + merge
Per-file “ingestion ledger” (hash/checkpoint) + skip already-ingested files

Idempotency must hold across:

reruns after crashes
partial completion
duplicate input files

Review checklist (definition of done)

Demonstrably streaming: memory does not scale with total rows.
Every skipped record is present in error CSV with full context.
Metrics are emitted and consistent with observed behavior.
Re-running import on same inputs does not create duplicates (verified).
Failures are isolated: one bad row/file does not poison the full run.

02When to use this skill

03Non-negotiables (contract)

04Standard workflow

051) Plan-first (before coding)

062) Streaming architecture

073) Validation & coercion

084) Error CSV logging

106) Idempotent persistence

Softwareentwickler

Data Engineering

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

Data Engineering

Clickhouse Io

고성능 분석 워크로드를 위한 ClickHouse 데이터베이스 패턴, 쿼리 최적화, 분석 및 데이터 엔지니어링 모범 사례.

Data Engineering

Claude Devfleet

通过Claude DevFleet协调多智能体编码任务——规划项目、在隔离的工作树中并行调度智能体、监控进度并读取结构化报告。

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

Ai First Engineering

团队中人工智能代理生成大部分实施输出的工程运营模型。

Data Engineering

Postgres Patterns

PostgreSQL database patterns for query optimization, schema design, indexing, and security. Based on Supabase best practices.

Softwareentwickler