Implement distributed tracing with OpenTelemetry (context propagation, sampling, semantic conventions, span design) and integrate with logs/metrics. Use when debugging microservices latency, dependency failures, or request flow across async boundaries.
In scope
Out of scope
Tracing answers: “Where did time go?” and “Which dependency caused failure?”
A trace is a tree/graph of spans. Each span represents a unit of work and captures timing, attributes, and events.
traceparent and tracestate headers for cross-service propagation.HTTP GET /v1/orders/{id}DB SELECT ordersMQ consume orders.createdstatus to represent error outcome.service.name, service.version, deployment.environment.
These must be consistent across services to enable cross-service queries.traceId and spanId into logs:
A minimal “span map” for a typical backend request:
HTTP <method> <route>auth.verifyDB <operation> <table>HTTP <method> <upstream>orders.create (manual)docs/tracing.md (propagation, sampling, span naming)observability/span-naming.mdobservability/attribute-policy.mdtraceparent extraction/injection on all hops.references/w3c-trace-context.mdreferences/w3c-baggage-policy.mdreferences/otel-java-agent.mdreferences/otel-semconv-http.mdreferences/sampling-playbook.md