OpenTelemetry, distributed tracing, structured logging, metrics (Prometheus, Grafana, Datadog). Use when implementing monitoring, tracing, or debugging production issues.
Instrumentation moderne avec OpenTelemetry pour métriques, traces et logs structurés.
| Pilier | Technologies | Métriques clés |
|---|---|---|
| Metrics | Prometheus, Grafana, Datadog | RED (Rate, Errors, Duration), USE (Utilization, Saturation, Errors) |
| Traces | OpenTelemetry, Jaeger, Tempo | P95 latency, span duration, error rate |
| Logs | Loki, ElasticSearch, Datadog | Structured JSON, correlation IDs |
// Node.js — Auto-instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter(),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
| Signal | Description | Seuil typique |
|---|---|---|
| Latency | P50, P95, P99 response time | P95 < 200ms |
| Traffic | Requests per second | Baseline + alerting |
| Errors | Error rate (5xx, exceptions) | < 0.1% |
| Saturation | CPU, Memory, Disk | < 80% sustained |
{
"timestamp": "2026-04-17T10:30:00Z",
"level": "error",
"message": "Payment processing failed",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"service.name": "payment-api",
"error.type": "PaymentGatewayTimeout"
}
| Concept | Exemple |
|---|---|
| SLI (Indicator) | 99.5% requests < 200ms |
| SLO (Objective) | 99.9% uptime mensuel |
| SLA (Agreement) | 99.95% uptime + pénalités |
Pour instrumentation détaillée par stack : invoquer @observability-engineer