Pointline data lake architecture and design guidance. Use when: (1) designing new tables, modules, or subsystems for the pointline data lake, (2) evaluating design trade-offs (encoding, partitioning, storage, schema evolution), (3) writing or reviewing ExecPlans or research proposals, (4) assessing change risk levels (L0/L1/L2) for PRs, (5) reviewing code for PIT correctness, determinism, or invariant violations, (6) planning new vendor integrations or data source onboarding, (7) making architectural decisions about schema, storage, or pipeline extensions, (8) understanding why existing design choices were made (function-first, fixed-point, quarantine-over-drop, no backward compatibility, SCD2), (9) extending the module dependency graph without introducing cycles.
Architecture guidance for extending pointline's PIT-accurate offline data lake. Use this skill when making design decisions, not when following existing patterns (use pointline-infra for execution).
These are non-negotiable. Every design decision must satisfy all five:
ts_event_us. Forward-only alignment.(vendor, data_type, bronze_path, file_hash).file_id + file_seq. No data appears without provenance.| Decision | Rationale | Alternative Rejected |
|---|---|---|
| Function-first over classes | Testable (pure in/out), composable, debuggable, no hidden state | OOP with stateful pipeline objects |
| Fixed-point Int64 over float/Decimal | Exact equality, O(1) comparison, no precision drift mid-pipeline, 8 bytes vs 16+ | Float64 (rounding), Decimal128 (slow, large) |
| Quarantine over drop | Auditability, coverage reporting, debug-ability. Every row accounted for | Silent filter (invisible data loss) |
| No backward compat | Bounded re-ingestion cost vs unbounded compatibility complexity. Bronze is immutable → rebuild is always possible | Migration scripts (compound over time) |
| Delta Lake | Partition pruning, ACID, time travel, schema enforcement, compaction. Single library, no external service | Raw Parquet (no ACID), DuckDB (no partitioned writes), PostgreSQL (no columnar) |
| SCD Type 2 for symbols | PIT-correct metadata at any historical timestamp. Validity windows enable as-of joins | SCD1 (lose history), snapshot tables (expensive joins) |
| Protocols over ABCs | Structural typing, no inheritance hierarchy, test doubles without mocking frameworks | Abstract base classes (rigid hierarchy) |
Full rationale with trade-off analysis: references/design-rationale.md
schemas (leaf)
↑
protocols ← depends on schemas for type hints
↑
├── ingestion ← schemas, protocols, dim_symbol
├── storage ← schemas, protocols
└── research ← schemas, dim_symbol
vendors ← schemas only (no other internal deps)
Rules for extending:
schemas/ has zero internal dependencies. Keep it leaf.vendors/ depends only on schemas/. Never import from ingestion/, storage/, or research/.research/ never imports from ingestion/ or storage/ (except via protocols).Before any PR, classify the risk level:
| Level | Scope | Review Required | Examples |
|---|---|---|---|
| L0 | Formatting, typos, non-semantic | Self-merge OK | Ruff fixes, docstring typos |
| L1 | Code/test with clear requirements | Tests pass, standard review | New parser, new validation rule, new test |
| L2 | Schema, PIT semantics, storage/replay | Explicit approval needed | New table spec, change tie-break keys, modify dim_symbol upsert logic, change partitioning |
L2 triggers — any change to:
pointline/schemas/ (table definitions)pointline/dim_symbol.py (SCD2 logic)pointline/storage/contracts.py (storage protocols)ingestion/pit.py or research/primitives.pyApply when reviewing any L1+ change:
ts_local_us preserved? Replay fidelity maintained?ExecPlans (for features/refactors) — follow .agent/PLANS.md:
Research Proposals — follow .agent/PROPOSALS.md:
Full planning reference: references/planning.md
pointline/schemas/ — set columns, tie-break keys, partition_by, scaled columnsload_events() need special handling?pointline/vendors/<vendor>/ — depends only on schemasEXCHANGE_TIMEZONE_MAP if new exchangestests/fixtures/valid_from_ts_us <= event_ts < valid_until_ts_us?DimensionStore?This is always L2. Schema changes = rebuild. Checklist:
Full extension patterns: references/extension-patterns.md
trading_date derivationtrading_date derived from exchange-local time (not UTC date) matches researcher mental modelfile_id + file_seq provide total ordering within tiesbook_seq)