Three pillars of observability covering Prometheus+Grafana (PromQL, alerts, recording rules), ELK Stack/Loki for logs, OpenTelemetry+Jaeger for traces, SLO/SLI/error budgets, alert fatigue prevention, runbook automation, and on-call rotation best practices.
OBSERVABILITY = METRICS + LOGS + TRACES
Metrics: Time-series data (CPU%, error rate, latency)
Logs: Structured text (JSON) with context
Traces: Request journey across services
# prometheus.yml