Generate checklists to validate that monitoring, logging, alerting, and tracing are working correctly. Use when verifying production readiness of observability stack, testing that alerts fire correctly, checking log completeness, or validating distributed tracing across services.
Validate that monitoring, logging, alerting, and tracing are working correctly before and after deployment.
# Observability Validation Checklist: [System]
## Metrics
- [ ] Application health metrics are collected (uptime, error rate)
- [ ] Business metrics are tracked (transactions, revenue)
- [ ] Infrastructure metrics are visible (CPU, memory, disk)
- [ ] Dashboard shows real-time data
- [ ] Historical data is retained for [X] days
## Logging
- [ ] Logs are structured (JSON format)
- [ ] Log levels are appropriate (no excessive DEBUG in production)
- [ ] Sensitive data is not logged (passwords, tokens, PII)
- [ ] Request correlation IDs are present
- [ ] Error logs include stack traces and context
## Alerting
- [ ] Critical alerts configured: [list]
- [ ] Alert routing is correct (right team, right channel)
- [ ] Alert thresholds are validated (not too noisy, not too quiet)
- [ ] Runbooks are linked to alerts
- [ ] Escalation paths are defined
## Tracing
- [ ] Traces span across service boundaries
- [ ] Trace context is propagated correctly
- [ ] Slow traces are identifiable
- [ ] Error traces are flagged