Modern observability and monitoring patterns centered on OpenTelemetry (OTel). Covers the three pillars (traces, metrics, logs) with context propagation, OTel SDK architecture, OTLP protocol, distributed tracing with W3C Trace Context, metric instrument types (Counter, Histogram, Gauge, Timer, Exemplars), key metrics to monitor (application, business, infrastructure), metric naming conventions, log correlation, OTel Collector pipelines, Semantic Conventions, backend integration (Jaeger, Grafana Tempo, Loki, Prometheus), alerting rules, health check patterns (liveness, readiness, startup), SLO/SLI design, error budget management, and business metrics modeling. Use when implementing distributed tracing, setting up OTel instrumentation, configuring Collector pipelines, designing alerting strategies, implementing health checks, defining SLO/SLI targets, or integrating observability backends.
OpenTelemetry provides a unified standard for collecting telemetry data.
| Signal | Purpose | Role in Debugging |
|---|---|---|
| Traces | Request flow across services | Where it went wrong (which service/span) |
| Metrics | Aggregated measurements over time | Something is wrong (alert trigger) |
| Logs | Discrete event records | What went wrong (error details) |
| Context | Correlates all signals via trace ID, span ID | Connect all three for correlated debugging |
[Application + OTel SDK]
|-- API (instrumentation interface)
|-- SDK (implementation: sampling, batching, export)
|-- Auto-instrumentation (zero-code)
|
[OTel Collector] (optional but recommended)
|-- Receivers → Processors → Exporters
|
[Backends: Jaeger, Tempo, Prometheus, Loki]
| Transport | Port | Use Case |
|---|---|---|
| gRPC | 4317 | Default, binary protobuf |
| HTTP | 4318 | Firewalls, load balancers |
Endpoints: /v1/traces, /v1/metrics, /v1/logs
A span represents a unit of work with:
http.request.method, db.system)| Kind | Direction | Example |
|---|---|---|
| Client | Outbound sync | HTTP client, DB client |
| Server | Inbound sync | HTTP server handler |
| Internal | In-process | Business logic |
| Producer | Outbound async | Queue publish |
| Consumer | Inbound async | Queue consume |