Database technology selection and architecture - choose between SQL vs NoSQL, PostgreSQL vs MySQL vs MongoDB, scaling strategies, migration planning. Use for database DECISIONS. For PostgreSQL implementation use postgresql skill.
Design the data layer right from the start to avoid costly rework. Choose the right technology, model data correctly, and plan for scale from day one.
| Scenario | Recommended | Why |
|---|---|---|
| General web app | PostgreSQL | Mature, feature-rich, JSONB support |
| Read-heavy, simple | MySQL | Fast reads, wide adoption |
| Embedded/local | SQLite | Zero-config, serverless |
| Real-time analytics | TimescaleDB | Time-series optimization |
| Document store | MongoDB | Flexible schema, JSON-native |
| Key-value cache | Redis | In-memory speed |
| Full-text search | Elasticsearch | Advanced search features |
| Graph relationships | Neo4j | Native graph queries |
1. Data structure → Relational vs Document vs Graph vs Key-Value
2. Scale expectations → Read-heavy vs Write-heavy vs Balanced
3. Consistency needs → Strong (ACID) vs Eventual (BASE)
4. Query patterns → Simple CRUD vs Complex joins vs Analytics
5. Team expertise → Familiar vs Learning curve acceptable
6. Operational budget → Managed vs Self-hosted
| Use Case | Primary | Alternative |
|---|---|---|
| SaaS multi-tenant | PostgreSQL | MySQL |
| E-commerce | PostgreSQL | MySQL |
| IoT sensor data | TimescaleDB | PostgreSQL + partitioning |
| Social network | PostgreSQL + Redis | Neo4j |
| Content management | PostgreSQL | MongoDB |
| Real-time dashboard | PostgreSQL + Redis | ClickHouse |
| Mobile app backend | PostgreSQL | SQLite (local) |
| Microservices | Per-service choice | PostgreSQL for most |
| Level | Eliminates | Trade-off |
|---|---|---|
| 1NF | Repeating groups | Baseline |
| 2NF | Partial dependencies | Still some redundancy |
| 3NF | Transitive dependencies | Good default for OLTP |
| Denormalize | N/A | Faster reads, slower writes |
-- ONLY denormalize when:
-- 1. Measured performance problem with JOINs
-- 2. Read-to-write ratio is very high (>100:1)
-- 3. Data is relatively static
-- Example: Pre-computed aggregate
ALTER TABLE products
ADD COLUMN order_count INT GENERATED ALWAYS AS (
(SELECT COUNT(*) FROM orders WHERE product_id = products.id)
) STORED; -- PostgreSQL only, otherwise use trigger/view
| Pattern | Use Case |
|---|---|
| Adjacency List | Hierarchical data (parent_id) |
| Closure Table | Deep hierarchies with ancestry queries |
| Materialized Path | File systems, categories |
| Junction Table | Many-to-many relationships |
| Soft Delete | Audit requirements |
| Event Sourcing | Audit trail, replay capability |
| Type | Best For |
|---|---|
| B-tree | Equality, range, sorting |
| Hash | Equality only (rare) |
| GIN | JSONB, arrays, full-text |
| GiST | Ranges, geometry |
| BRIN | Large time-series |
Is it equality lookups only? → B-tree (default)
Is it a JSONB column? → GIN
Is it a range query? → B-tree or GiST
Is it full-text search? → GIN on tsvector
Is it >100M rows with time ordering? → BRIN
Is it a composite query? → Composite B-tree (order matters!)
-- ❌ Too many indexes (slows writes)
-- ❌ Indexing low-cardinality columns (status with 2 values)
-- ❌ Not indexing foreign keys (PostgreSQL!)
-- ❌ Wrong column order in composite index
-- ❌ Not using partial indexes for hot subsets
| Approach | When | Pros | Cons |
|---|---|---|---|
| Vertical | <10x growth | Simple | Hardware limits |
| Read Replicas | Read-heavy | Proven | Replication lag |
| Partitioning | Large tables | Query pruning | Complexity |
| Sharding | Massive scale | Linear growth | Cross-shard queries hard |
| Type | Best For | Key Selection |
|---|---|---|
| RANGE | Time-series | Created_at, timestamp |
| LIST | Discrete values | Region, category |
| HASH | Even distribution | User_id, order_id |
┌─────────────────┐
│ L1: App Cache │ ← In-process, fastest
├─────────────────┤
│ L2: Redis │ ← Distributed, shared
├─────────────────┤
│ L3: DB Buffer │ ← PostgreSQL shared_buffers
└─────────────────┘
| Strategy | Use Case |
|---|---|
| TTL | Default for most data |
| Event-driven | Critical consistency |
| Write-through | Never stale, slower writes |
| Write-behind | High write throughput |
1. Create new schema alongside old
2. Dual-write to both schemas
3. Backfill historical data
4. Verify data integrity
5. Switch reads to new schema
6. Stop writes to old schema
7. Remove old schema
| Tool | Language | Best For |
|---|---|---|
| Flyway | Java | Enterprise |
| Liquibase | Java | Complex migrations |
| Prisma Migrate | TypeScript | Type safety |
| Alembic | Python | SQLAlchemy |
| Drizzle Kit | TypeScript | Drizzle ORM |
When asked for database architecture advice:
| Request | Response |
|---|---|
| "DB for e-commerce" | PostgreSQL + SKILL → postgresql |
| "Mobile app with offline" | SQLite (local) + PostgreSQL (cloud) |
| "IoT 1M events/sec" | TimescaleDB + Kafka |
| "Social network feeds" | PostgreSQL + Redis cache |
| "Audit trail required" | PostgreSQL with event sourcing |
| Skill | Use When |
|---|---|
| postgresql | PostgreSQL implementation details |