Model data to match access patterns. Normalize for integrity, denormalize for performance.
Choosing a Database
Decision Matrix
Factor
SQL (Relational)
NoSQL (Document)
Key-Value
Graph
Schema
Fixed, strict
Flexible
None
Nodes/edges
Transactions
ACID built-in
Eventually consistent (usually)
Limited
Varies
Joins
Native support
Expensive/manual
N/A
Native (relationships)
Related Skills
Scale
Vertical → Horizontal
Horizontal native
Horizontal native
Specialized
Best for
Structured data, complex queries
Rapid iteration, varied shape
Cache, sessions
Relationships
Common Choices (2026)
Database
Type
Best For
PostgreSQL
Relational
General purpose, full-featured
SQLite
Relational
Embedded, serverless
MongoDB
Document
Rapid prototyping, flexible schema
Redis
Key-Value
Caching, sessions, pub/sub
Cosmos DB
Multi-model
Azure-native, global distribution
DynamoDB
Key-Value/Document
AWS-native, massive scale
Neo4j
Graph
Social networks, recommendations
Schema Design Principles
Normalization Levels
Form
Rule
Trade-off
1NF
No repeating groups
Basic structure
2NF
No partial dependencies
Remove redundancy
3NF
No transitive dependencies
Data integrity
BCNF
Every determinant is a key
Strict integrity
When to Denormalize
Read-heavy workloads
Complex queries hitting many tables
When consistency can be eventual
Reporting/analytics tables
Example: E-Commerce Schema
-- Normalized (3NF)
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
created_at TIMESTAMP DEFAULT NOW(),
status VARCHAR(50) NOT NULL
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price DECIMAL(10,2) NOT NULL
);
-- Denormalized for reporting
CREATE TABLE order_summary (
order_id INTEGER PRIMARY KEY,
customer_email VARCHAR(255),
customer_name VARCHAR(255),
total_amount DECIMAL(10,2),
item_count INTEGER,
created_at TIMESTAMP
);
Modern ORM Patterns
Prisma (TypeScript)
// schema.prisma
model User {
id Int @id @default(autoincrement())
email String @unique
name String?
posts Post[]
profile Profile?
}
model Post {
id Int @id @default(autoincrement())
title String
content String?
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id])
authorId Int
}
public class User
{
public int Id { get; set; }
public string Email { get; set; } = null!;
public string? Name { get; set; }
public ICollection<Post> Posts { get; } = new List<Post>();
}
// Usage
var user = await context.Users
.Include(u => u.Posts)
.FirstOrDefaultAsync(u => u.Email == "[email protected]");
Query Optimization
The EXPLAIN Plan
-- PostgreSQL
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.author_id
GROUP BY u.id
ORDER BY post_count DESC
LIMIT 10;
Reading Execution Plans
Operation
Meaning
Concern
Seq Scan
Full table scan
Consider index
Index Scan
Using index
Good
Nested Loop
For each row...
OK for small sets
Hash Join
Build hash table
Good for large sets
Sort
In-memory sort
Check work_mem
Index Strategy
-- Single column (most common)
CREATE INDEX idx_users_email ON users(email);
-- Composite (column order matters!)
CREATE INDEX idx_posts_author_date ON posts(author_id, created_at DESC);
-- Partial (when you query a subset)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Covering (includes all needed columns)
CREATE INDEX idx_posts_covering ON posts(author_id) INCLUDE (title, created_at);
Query Anti-Patterns
Anti-Pattern
Problem
Fix
SELECT *
Fetches unnecessary data
List specific columns
N+1 queries
Loop makes N queries
Use JOIN or eager loading
Missing index on WHERE
Full table scan
Add appropriate index
OR in WHERE
Can't use index efficiently
UNION ALL or restructure
Functions on columns
WHERE YEAR(date) = 2026
WHERE date >= '2026-01-01'
Data Modeling Patterns
Soft Deletes
-- Instead of DELETE, update a flag
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
-- Soft delete
UPDATE users SET deleted_at = NOW() WHERE id = 123;
Audit Trails
CREATE TABLE audit_log (
id SERIAL PRIMARY KEY,
table_name VARCHAR(100) NOT NULL,
record_id INTEGER NOT NULL,
action VARCHAR(20) NOT NULL, -- INSERT, UPDATE, DELETE
old_values JSONB,
new_values JSONB,
changed_by INTEGER REFERENCES users(id),
changed_at TIMESTAMP DEFAULT NOW()
);
-- Trigger for automatic auditing (PostgreSQL)
CREATE OR REPLACE FUNCTION audit_trigger()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_log (table_name, record_id, action, old_values, new_values)
VALUES (TG_TABLE_NAME, COALESCE(NEW.id, OLD.id), TG_OP,
to_jsonb(OLD), to_jsonb(NEW));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Temporal Tables
-- PostgreSQL example
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10,2),
valid_from TIMESTAMP DEFAULT NOW(),
valid_to TIMESTAMP DEFAULT 'infinity'
);
-- Query current data
SELECT * FROM products WHERE valid_to = 'infinity';
-- Query historical data (what was the price on Jan 1?)
SELECT * FROM products
WHERE valid_from <= '2026-01-01' AND valid_to > '2026-01-01';
Small, incremental changes — One concern per migration
Backward compatible — Add before remove
Test migrations — Run against production copy
Zero-Downtime Schema Changes
-- Phase 1: Add new column (nullable)
ALTER TABLE users ADD COLUMN new_email VARCHAR(255);
-- Phase 2: Backfill data (in batches)
UPDATE users SET new_email = email WHERE new_email IS NULL LIMIT 1000;
-- Phase 3: Update application to write to both columns
-- Phase 4: Make new column required
ALTER TABLE users ALTER COLUMN new_email SET NOT NULL;
-- Phase 5: Update application to read from new column
-- Phase 6: Drop old column
ALTER TABLE users DROP COLUMN email;
ALTER TABLE users RENAME COLUMN new_email TO email;
Connection Management
Connection Pooling
// Prisma with connection pool
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
// Pool settings in URL: ?connection_limit=10&pool_timeout=30
}
// Node postgres pool
const pool = new Pool({
max: 20, // Max connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
Connection Pool Sizing
Rule of thumb for PostgreSQL:
connections = (core_count * 2) + spindle_count
For cloud: Start with 10-20, monitor and adjust.
Implementation Checklist
New Database
Choose appropriate database type
Design initial schema (3NF or justified denormalization)
Define primary keys and foreign keys
Plan indexes for known query patterns
Set up connection pooling
Configure backups and point-in-time recovery
Set up monitoring (query performance, connections)