Guides database architecture decisions for PostgreSQL, DuckDB, Parquet, and PGVector. Use when designing schemas, choosing storage strategies, optimizing queries, tuning maintenance, configuring vector search, or diagnosing performance issues across OLTP, OLAP, and similarity search workloads.
Decision guidance for PostgreSQL, DuckDB, and Parquet in hybrid storage architectures.
| Workload | Use | Why |
|---|---|---|
| Transactional (CRUD, users, sessions) | PostgreSQL | ACID, row-level locking, indexes |
| Analytical (aggregations, scans) | DuckDB | Columnar, vectorized, parallel |
| Data storage/interchange | Parquet | Compressed, columnar, portable |
| Metadata + relationships |
| PostgreSQL |
| Foreign keys, constraints |
| Ad-hoc exploration | DuckDB | Fast on Parquet, no ETL needed |
| Time-series with point lookups | PostgreSQL + partitioning | Partition pruning + indexes |
| Time-series analytics | DuckDB on Parquet | Scan performance |
| Vector similarity search | PostgreSQL + PGVector | HNSW/IVFFlat indexes, hybrid search |
| RAG / semantic search | PostgreSQL + PGVector | Embeddings + metadata in same DB |
Hybrid pattern example:
Use for: Metadata, relationships, OLTP workloads, anything needing ACID.
Key decisions:
See references/postgres-architecture.md for maintenance patterns. See references/postgres-querying.md for advanced query techniques.
Use for: Analytics, aggregations, Parquet queries, data exploration.
Key decisions:
See references/duckdb-architecture.md for storage and parallelism. See references/duckdb-querying.md for DuckDB-specific SQL features.
Use for: Storing analytical data, data interchange, columnar compression.
Key decisions:
See references/parquet-architecture.md for file design. See references/parquet-querying.md for query optimization.
Use for: Similarity search, RAG applications, semantic search, recommendations.
Key decisions:
See references/pgvector-architecture.md for index configuration. See references/pgvector-querying.md for hybrid search and filtering.
| Convention | Example | Applies to |
|---|---|---|
| snake_case tables | dataset_jobs | All |
| snake_case columns | created_at | All |
| Singular table names | dataset not datasets | PostgreSQL |
| Plural for collections | datasets/ directory | Parquet files |
| Pattern | When to normalize | When to denormalize |
|---|---|---|
| Lookup tables | PostgreSQL, changes frequently | DuckDB/Parquet, static data |
| Repeated values | PostgreSQL, storage matters | Parquet, compression handles it |
| Joins at query time | PostgreSQL, complex relationships | Parquet, pre-join for analytics |
TIMESTAMPTZTIMESTAMP with isAdjustedToUTC=trueEXPLAIN (ANALYZE, BUFFERS) on the querypg_stat_user_tables for bloat (dead tuples)work_mem if seeing disk sortsSELECT * on remote dataef_search (HNSW) or probes (IVFFlat) settings