structlog + OpenTelemetry + Sentry + Prometheus. RED metrics per endpoint and per pipeline stage.
structlog.contextvars: request_id, user_id (if auth), trace_id.logger.info("extraction.stage.completed", stage="ner", latency_ms=42).{text, note, content, input}.opentelemetry-instrumentation-*.pipeline.deid, pipeline.ner, etc.sentry-sdk[fastapi]) and web (@sentry/nextjs).before_send scrubs any PHI-looking fields defense-in-depth.Expose /metrics endpoint. Track RED (Rate, Errors, Duration) per endpoint + per pipeline stage:
http_requests_total{route,method,status}http_request_duration_seconds{route} (histogram, SLO buckets)pipeline_stage_duration_seconds{stage}pipeline_stage_errors_total{stage,error_type}extraction_entities_total{label}normalization_cache_hits_total{vocab}/v1/extract p95 < 2000ms for notes ≤ 2KB.Every alert has a link to a section in docs/runbook.md with "likely causes" and "what to check."
except Exception: logger.error(...)). Log, enrich, re-raise.