Name: Database Elasticache
Author: chrishuffman5

Database Elasticache | Skills Pool

../SKILL.md

Service	Engine(s)	Durability	Use Case
ElastiCache for Redis	Redis OSS 6.2, 7.0, 7.1	Snapshots + replication (not durable by default)	Caching, session store, pub/sub, leaderboards
ElastiCache for Valkey	Valkey 7.2, 8.0	Snapshots + replication (not durable by default)	Caching, session store -- Redis-compatible, open-source
ElastiCache for Memcached	Memcached 1.6.x	None (pure cache)	Simple key-value caching, multi-threaded
ElastiCache Serverless	Redis OSS, Valkey	Snapshots + replication	Auto-scaling cache with no node management
MemoryDB for Redis	Redis OSS 6.2, 7.0, 7.1	Multi-AZ transaction log (durable)	Primary database workloads requiring in-memory speed
MemoryDB for Valkey	Valkey 7.2, 8.0	Multi-AZ transaction log (durable)	Primary database workloads, Redis-compatible, open-source

Family	Examples	CPU	Memory Range	Network	Use Case
r7g (Graviton3)	cache.r7g.large - 16xlarge	ARM64	13.07 - 635.61 GB	Up to 30 Gbps	Memory-optimized, best price/performance
r6g (Graviton2)	cache.r6g.large - 16xlarge	ARM64	13.07 - 635.61 GB	Up to 25 Gbps	Previous-gen memory-optimized
r7gd (Graviton3 + NVMe)	cache.r7gd.xlarge - 16xlarge	ARM64	26.32 - 635.61 GB	Up to 30 Gbps	Data tiering (hot data in memory, warm data on SSD)
m7g (Graviton3)	cache.m7g.large - 16xlarge	ARM64	6.38 - 507.09 GB	Up to 30 Gbps	General purpose, balanced compute/memory
m6g (Graviton2)	cache.m6g.large - 16xlarge	ARM64	6.38 - 507.09 GB	Up to 25 Gbps	Previous-gen general purpose
c7gn (Graviton3)	cache.c7gn.large - 16xlarge	ARM64	3.09 - 507.09 GB	Up to 200 Gbps	Network-intensive workloads
t4g (Graviton2)	cache.t4g.micro - medium	ARM64	0.5 - 3.09 GB	Up to 5 Gbps	Dev/test, burstable, low cost
t3 (Intel)	cache.t3.micro - medium	x86_64	0.5 - 3.09 GB	Up to 5 Gbps	Dev/test, burstable

1. Application checks cache for data
2. Cache hit -> return data
3. Cache miss -> query database, write result to cache, return data

1. Application writes to cache AND database simultaneously
2. Reads always come from cache

1. Application writes to cache
2. Cache asynchronously writes to database (batched, delayed)

Metric	Threshold	Action
`CPUUtilization`	> 90% sustained	Scale up node type or scale out (more shards)
`EngineCPUUtilization`	> 80% sustained	Scale up or optimize hot commands
`DatabaseMemoryUsagePercentage`	> 80%	Scale up memory, add shards, enable data tiering, or optimize data
`CurrConnections`	> 60,000	Implement connection pooling, check for connection leaks
`NewConnections`	Spikes > 1000/min	Connection storm -- check application restart or pooling issues
`Evictions`	> 0 sustained	Memory pressure -- scale up, increase TTL discipline, check memory policy
`CacheHitRate`	< 80%	Review caching strategy, check TTLs, check key design
`ReplicationLag`	> 1 second	Network issues, write-heavy workload, replica overloaded with reads
`SwapUsage`	> 50 MB	Node memory exhausted -- scale up immediately
`NetworkBytesIn/Out`	> 80% of bandwidth limit	Scale up node type for more network capacity
`GlobalDatastoreReplicationLag`	> 5 seconds	Cross-region replication falling behind -- check network, write volume

Symptom	Likely Cause	Investigation	Resolution
High latency spikes	BGSAVE/BGREWRITEAOF, KEYS command, large value operations	Check `SLOWLOG GET 25`, `INFO persistence`, CloudWatch `EngineCPUUtilization`	Optimize commands, schedule BGSAVE in low-traffic window, avoid O(N) commands
Evictions increasing	Memory pressure	`INFO memory`, `DatabaseMemoryUsagePercentage` metric	Scale up, remove unused keys, tighten TTLs, enable data tiering
Connection refused	Max connections reached, security group misconfigured	`CurrConnections` metric, security group rules	Increase maxclients parameter, fix security groups, implement connection pooling
Failover not completing	No available replica, replica lag too high	`describe-replication-groups`, `ReplicationLag` metric	Ensure Multi-AZ enabled, check replica health
Replication lag growing	Heavy write load, network saturation, slow replica	`ReplicationLag` metric, `NetworkBytesIn/Out`	Scale up replica node type, reduce write volume, check network
Cluster mode resharding slow	Large dataset, many keys to migrate	`describe-replication-groups` for resharding status	Allow more time, avoid resharding during peak, plan smaller increments
Global Datastore lag high	Cross-region network, heavy writes	`GlobalDatastoreReplicationLag` metric	Reduce write volume, check cross-region connectivity
Cache hit rate low	TTLs too short, wrong caching strategy, key churn	`CacheHitRate` metric, application access patterns	Increase TTLs, review caching strategy, pre-warm cache

Database Elasticache

Amazon ElastiCache and MemoryDB Technology Expert