DynamoDB data modeling and design patterns for AWS applications. Use when: DynamoDB table design, single-table design, DynamoDB GSI, partition key strategy, sort key design, DynamoDB query optimization, NoSQL data modeling, DynamoDB streams, DynamoDB TTL, DynamoDB transactions, batch operations, DAX caching, capacity planning, access pattern modeling, DynamoDB CDC, item collection, secondary index design, write sharding. Do NOT use when: relational database design, SQL queries, PostgreSQL schema, MySQL optimization, MongoDB queries, Redis caching, Elasticsearch indexing, general SQL joins, Oracle tuning, Cassandra ring design.
Store multiple entity types in one table using generic key names (PK, SK). Prefix values with entity type for disambiguation.
PK | SK | Entity
USER#u123 | METADATA | User profile
USER#u123 | ORDER#2024-03-15#o456 | User's order
USER#u123 | ORDER#2024-03-15#o456#ITEM#i789 | Order item
ORG#org1 | USER#u123 | Org membership
<!-- tested: needs-fix -->Use single-table when:
Use multi-table when:
| Pattern | PK Example | Use Case |
|---|---|---|
| Entity ID | USER#u123 | Direct lookups |
| Composite | TENANT#t1#USER#u123 | Multi-tenant isolation |
| Write sharding | VOTES#item1#3 (append 0-N) | Hot partition mitigation |
| Time-bucketed | LOGS#2024-03-15 | Time-series with known ranges |
When a single key receives disproportionate traffic, append a random suffix:
import random
SHARD_COUNT = 10
pk = f"COUNTER#{item_id}#{random.randint(0, SHARD_COUNT - 1)}"
# Read: query all shards and aggregate
Sort keys enable range queries, hierarchical data, and version tracking within a partition.
| Pattern | SK Example | Enables |
|---|---|---|
| Hierarchical | COUNTRY#US#STATE#CA#CITY#LA | begins_with at any level |
| Timestamp | ORDER#2024-03-15T10:30:00Z | Range queries on time |
| Version | v0 (current), v1, v2 | Version history |
| Composite | STATUS#active#DATE#2024-03-15 | Filter by status + time |
| Zero-padded | RANK#000042 | Numeric sort as strings |
= exact match<, <=, >, >= rangeBETWEEN inclusive rangebegins_with prefix matching (most powerful for hierarchies)GSIs project data into a separate partition structure with different PK/SK. They consume their own capacity and replicate asynchronously.
Reuse a single GSI for multiple access patterns by overloading its key semantics:
GSI1PK | GSI1SK | Use
[email protected] | USER | Lookup user by email
ORG#org1 | USER#u123 | List users in org
STATUS#active | DATE#2024-03-15 | Active items by date
Create a GSI with SK as its PK and PK as its SK to reverse query direction:
Table: PK=USER#u123, SK=ORDER#o456
GSI: PK=ORDER#o456, SK=USER#u123 → "which user placed this order?"
| Access Pattern | Key Design |
|---|---|
| Get user profile | PK=USER#u123, SK=METADATA |
| List user orders (newest first) | PK=USER#u123, SK=begins_with("ORDER#"), ScanIndexForward=false |
| Get order details + items | PK=ORDER#o456, SK=begins_with("") |
| Orders by status | GSI1PK=STATUS#shipped, GSI1SK=DATE#2024-03-15 |
| Lookup by email | GSI2PK=[email protected], GSI2SK=USER |
Streams capture item-level changes in time order. Each record contains the key and optionally old/new images.
KEYS_ONLY — only key attributes (cheapest)NEW_IMAGE — full item after modificationOLD_IMAGE — full item before modificationNEW_AND_OLD_IMAGES — both (most expensive, needed for diffs)Set a numeric attribute (epoch seconds) as the TTL attribute. DynamoDB deletes expired items automatically at no write cost.
import time
# Session expiry (24 hours)
item["ttl"] = int(time.time()) + 86400
# Soft delete: set TTL to 30 days, archive via Stream before deletion
item["ttl"] = int(time.time()) + (30 * 86400)
item["deleted"] = True
# Rolling window: keep only last 90 days of events
event["ttl"] = int(time.time()) + (90 * 86400)
FilterExpression: ttl > :nowTransactWriteItems and TransactGetItems provide ACID across up to 100 items/4MB.
client.transact_write_items(
TransactItems=[
{
"Put": {
"TableName": "app",
"Item": {"PK": {"S": "ORDER#o789"}, "SK": {"S": "METADATA"}, ...},
"ConditionExpression": "attribute_not_exists(PK)" # idempotency
}
},
{
"Update": {
"TableName": "app",
"Key": {"PK": {"S": "USER#u123"}, "SK": {"S": "METADATA"}},
"UpdateExpression": "SET orderCount = orderCount + :one",
"ExpressionAttributeValues": {":one": {"N": "1"}}
}
}
]
)
UnprocessedItems in response with exponential backoffUnprocessedKeys if throttled — retry with backoffimport time
def batch_write_with_retry(table, items, max_retries=5):
unprocessed = items
for attempt in range(max_retries):
response = table.batch_write_item(RequestItems=unprocessed)
unprocessed = response.get("UnprocessedItems", {})
if not unprocessed:
return
time.sleep(2 ** attempt * 0.1) # exponential backoff
raise Exception("Failed to process all items")
In-memory cache that sits between your app and DynamoDB. Microsecond read latency.
import amazondax
dax_client = amazondax.AmazonDaxClient(endpoints=["dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111"])
response = dax_client.get_item(TableName="app", Key={"PK": {"S": "USER#u123"}, "SK": {"S": "METADATA"}})
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Scan for queries | O(n) cost, reads entire table | Use Query with proper key design |
| Low-cardinality PK | Hot partitions, throttling | Use high-cardinality keys, add sharding |
| Read-before-write | 2x capacity, race conditions | Use ConditionExpression or UpdateExpression |
| One table per entity | Cannot fetch related data efficiently | Single-table design with shared PK |
| Missing projections | Wastes RCU on unneeded attributes | Always set ProjectionExpression |
| Large items (>100KB) | Slow reads, high RCU cost | Compress or move large data to S3 |
| Relational modeling | Normalized tables need multiple queries | Denormalize, duplicate data at write time |
| Ignoring GSI cost | Each GSI replicates all writes | Only create GSIs for real access patterns |
| No retry on unprocessed | Silent data loss in batch ops | Always handle UnprocessedItems/Keys |
| Filter instead of key design | Reads then discards data | Push filtering into key/index design |
GetItem → exact PK + SK
Query → exact PK + SK condition (=, <, >, between, begins_with)
Scan → avoid; full table read
PutItem → write single item (upsert)
UpdateItem → partial update with expressions
DeleteItem → remove by PK + SK
BatchGetItem → up to 100 GetItem calls
BatchWriteItem → up to 25 Put/Delete calls
TransactGetItems → up to 100 items, ACID reads
TransactWriteItems → up to 100 items, ACID writes
PK | SK | GSI1PK | GSI1SK | Attrs
TENANT#t1 | METADATA | — | — | name, plan, createdAt
TENANT#t1 | USER#u1 | USER#u1 | TENANT#t1 | email, role
TENANT#t1 | PROJECT#p1 | STATUS#active | DATE#2024-03-15 | title, owner
USER#u1 | METADATA | EMAIL#[email protected] | USER | name, avatar
USER#u1 | SESSION#s1 | — | — | token, ttl
PROJECT#p1 | METADATA | — | — | title, description
PROJECT#p1 | TASK#2024-03-15#tk1 | ASSIGNEE#u1 | DUE#2024-03-20 | title, status
Access patterns served:
Query PK=TENANT#t1, SK=METADATAQuery PK=TENANT#t1, SK begins_with USER#Query GSI1 PK=EMAIL#[email protected]Query GSI1 PK=ASSIGNEE#u1, SK begins_with DUE#Query GSI1 PK=STATUS#active, SK begins_with DATE#In-depth guides in the references/ directory:
references/advanced-patterns.md — Single-table design deep dive, adjacency list pattern, composite sort keys, sparse indexes, write sharding, hot partition mitigation, hierarchical data modeling, time-series patterns, graph-like queries, materialized aggregations, multi-tenant isolation, event sourcing.
references/troubleshooting.md — Throttling diagnosis, hot partition identification, GSI backpressure, scan performance optimization, large item issues, transaction conflicts, Stream processing lag, capacity estimation errors, cost optimization, common error codes, monitoring and alerting.
references/api-reference.md — Complete DynamoDB API patterns with code examples: GetItem, PutItem, UpdateItem, DeleteItem, Query, Scan, BatchGetItem, BatchWriteItem, TransactGetItems, TransactWriteItems, PartiQL, expression syntax (key conditions, filters, projections, conditions, update expressions), pagination patterns, error handling.
Executable helper scripts in the scripts/ directory:
scripts/table-design.sh — Interactive CLI to scaffold a DynamoDB table definition. Prompts for table name, keys, billing mode, GSIs, TTL, Streams, and PITR. Outputs CloudFormation YAML, CDK TypeScript, or Terraform HCL. Supports --non-interactive mode for automation.
./scripts/table-design.sh # interactive
./scripts/table-design.sh --output cdk # CDK output
./scripts/table-design.sh --output terraform # Terraform output
scripts/capacity-calculator.sh — Calculate RCU/WCU requirements and estimated monthly cost based on item size, read/write rates, consistency mode, and GSI count. Compares provisioned vs on-demand pricing.
./scripts/capacity-calculator.sh --item-size 2.5 --reads 500 --writes 200 --consistency eventual
./scripts/capacity-calculator.sh --item-size 4 --reads 1000 --writes 100 --gsi-count 3
scripts/scan-table.sh — Parallel scan a DynamoDB table with progress tracking. Supports filtering, projection, rate limiting, and JSON output. Requires AWS CLI and jq.
./scripts/scan-table.sh --table MyTable --segments 10 --output results.json
./scripts/scan-table.sh --table MyTable --filter "status = :s" --values '{":s":{"S":"active"}}'
Reusable templates in the assets/ directory:
assets/cloudformation-table.yaml — Production-ready CloudFormation template for a DynamoDB table with: two GSIs (GSI1, GSI2), auto-scaling on all indexes, Point-in-Time Recovery, DynamoDB Streams, Contributor Insights, TTL, CloudWatch alarms for throttling and system errors. Parameterized for billing mode, capacity ranges, and environment.
assets/single-table-schema.json — Complete single-table design schema document for a SaaS project management app. Documents 8 entity types (Tenant, User, TenantMembership, Project, Task, Comment, Notification, AuditLog), their key patterns, GSI mappings (including sparse indexes), 14 access patterns with query specifications, TTL strategy, and capacity estimates.