PostgreSQL operational health monitoring covering XID wraparound, bloat, autovacuum, and performance metrics.

Purpose

Operational health issues in PostgreSQL are silent killers — they degrade gradually until a threshold is crossed and the system fails catastrophically. XID wraparound can force emergency single-user-mode recovery. Unchecked bloat can exhaust disk space. Misconfigured autovacuum can leave dead tuples consuming memory and I/O. This skill provides the queries, thresholds, and tuning guidance for proactive monitoring.

Use this skill when setting up database monitoring, investigating unexplained performance degradation, tuning autovacuum for high-write workloads, or conducting periodic operational health audits.

XID Wraparound Monitoring

PostgreSQL uses 32-bit transaction IDs (XIDs). Every transaction consumes one XID. When the XID space (approximately 2 billion usable IDs) is exhausted without vacuum freezing old tuples, PostgreSQL enters emergency shutdown to prevent data corruption.

How It Works

Each row stores the XID of the transaction that created it

PostgreSQL operational health monitoring covering XID wraparound, bloat, autovacuum, and performance metrics.

Purpose

Use this skill when setting up database monitoring, investigating unexplained performance degradation, tuning autovacuum for high-write workloads, or conducting periodic operational health audits.

XID Wraparound Monitoring

How It Works

Each row stores the XID of the transaction that created it

XID Age	Status	Action
< 500M	OK	Normal operation
500M – 800M	Watch	Verify autovacuum is running; investigate blocked vacuums
800M – 1.2B	Warning	Manual VACUUM FREEZE on oldest tables; investigate locks
> 1.2B	Critical	Emergency vacuum required; risk of protective shutdown
> 2B	Shutdown	PostgreSQL refuses new writes to prevent data loss

Metric	Healthy	Investigate	Critical
Dead tuple percentage	< 5%	5-20%	> 20%
Table size vs live data	< 1.5x	1.5-3x	> 3x
Index size vs table size	< 2x	2-5x	> 5x

Approach	Downtime	Use When
Wait for autovacuum	None	Dead tuple % is moderate, autovacuum is running
Manual `VACUUM`	None	Autovacuum is behind; need immediate cleanup
`VACUUM FULL`	Yes (ACCESS EXCLUSIVE lock)	Extreme bloat; need to reclaim disk space
`pg_repack`	Minimal	Need to reclaim space without downtime
`REINDEX CONCURRENTLY`	None	Index-specific bloat

Parameter	Default	Purpose	Tune When
`autovacuum_vacuum_threshold`	50	Min dead tuples before vacuum triggers	Small tables vacuumed too rarely
`autovacuum_vacuum_scale_factor`	0.2 (20%)	Fraction of table size added to threshold	Large tables vacuumed too rarely
`autovacuum_analyze_threshold`	50	Min changed rows before analyze triggers	Query plans going stale
`autovacuum_analyze_scale_factor`	0.1 (10%)	Fraction of table size added to threshold	Large tables analyzed too rarely
`autovacuum_vacuum_cost_delay`	2ms	Delay between vacuum I/O operations	Vacuum too slow or too aggressive
`autovacuum_vacuum_cost_limit`	200	I/O cost limit per vacuum cycle	Vacuum not keeping up with writes
`autovacuum_max_workers`	3	Concurrent vacuum workers	Many tables need simultaneous vacuuming
`autovacuum_freeze_max_age`	200M	XID age that triggers anti-wraparound vacuum	Never lower this without understanding implications

Blocker	Detection	Fix
Long-running transactions	`SELECT * FROM pg_stat_activity WHERE state != 'idle' AND xact_start < now() - interval '1 hour'`	Terminate or fix the long transaction
Abandoned prepared transactions	`SELECT * FROM pg_prepared_xacts`	`ROLLBACK PREPARED '<name>'`
Replication slot lag	`SELECT * FROM pg_replication_slots WHERE active = false`	Drop inactive slots
Hot standby feedback	`hot_standby_feedback = on` on replica	Consider disabling or tuning `max_standby_streaming_delay`

Cache Hit %	Status	Action
> 99%	Excellent	Working set fits in memory
95-99%	Good	Acceptable for most workloads
90-95%	Watch	Consider increasing `shared_buffers` or optimizing queries
< 90%	Poor	Working set exceeds memory; increase `shared_buffers` or reduce working set

Table Size	High Seq Scan %	Likely Cause
< 1MB	Normal	Small tables are faster to seq scan than index scan
1MB - 100MB	Investigate	May need an index or queries may not match existing indexes
> 100MB	Problem	Missing index, index not matching query pattern, or stale statistics

Operational Health Auditing

Purpose

XID Wraparound Monitoring

How It Works

Operational Health Auditing

Purpose

XID Wraparound Monitoring

How It Works

Monitoring Query

Per-Table XID Age

Thresholds

Common Causes of High XID Age

Table and Index Bloat

Table Bloat Detection

Index Bloat Detection

Bloat Thresholds

Remediation

Autovacuum Tuning

Key Parameters

Per-Table Overrides

Monitoring Autovacuum Activity

Dead Tuple Tracking

Monitoring Query

What Blocks Vacuum From Reclaiming Dead Tuples

Cache Hit Ratios

Database-Level Cache Hit Ratio

Table-Level Cache Hit Ratio

Thresholds

Sequential Scan Ratios

Monitoring Query

Interpretation

Anti-Patterns

1. Ignoring Autovacuum Warnings

2. VACUUM FULL as Routine Maintenance

3. Disabling Autovacuum

Examples

Example 1: Investigating Slow Query Performance

Example 2: XID Wraparound Alert

Example 3: Healthy Operational State

Postgres Patterns

Postgres Patterns

Database Migrations

Postgres Patterns

Postgres Patterns

Jpa Patterns