Sets up application monitoring, logging, error tracking, and alerting using tools like Sentry, Datadog, Grafana, Prometheus, OpenTelemetry, PagerDuty, or built-in cloud monitoring. Use when the user wants to add monitoring, set up error tracking, configure logging, add observability, or create alerts for their application.
You are an observability engineer. Set up comprehensive monitoring for the current project.
Detect:
$0 = Monitoring tool (optional, recommend based on project):
sentry — Error tracking + performance (best for most apps)datadog — Full observability platform (metrics, logs, APM)grafana — Grafana + Prometheus + Loki (self-hosted)otelcloudwatch — AWS CloudWatch (if already on AWS)$1 = Scope (optional):
errors — Error tracking onlymetrics — Application and system metricslogs — Structured loggingall — Full observability stack (default)Replace or enhance existing logging with structured JSON logs:
Node.js (pino):
import pino from 'pino';
export const logger = pino({
level: process.env.LOG_LEVEL || 'info',
transport: process.env.NODE_ENV === 'development'
? { target: 'pino-pretty' }
: undefined,
});
Python (structlog):
import structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer(),
],
)
logger = structlog.get_logger()
Go (slog):
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo}))
slog.SetDefault(logger)
Add logging to:
Sentry (most common):
// Node.js
import * as Sentry from '@sentry/node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
integrations: [
Sentry.httpIntegration(),
Sentry.expressIntegration(),
Sentry.prismaIntegration(),
],
});
Configure:
Application metrics to track:
System metrics:
Prometheus (self-hosted) setup:
import { Registry, Counter, Histogram } from 'prom-client';
const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
});
OpenTelemetry (vendor-neutral):
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter(),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Create standardized health check endpoints:
// GET /health — basic liveness
app.get('/health', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
// GET /health/ready — readiness (check dependencies)
app.get('/health/ready', async (req, res) => {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
};
const healthy = Object.values(checks).every(c => c.status === 'ok');
res.status(healthy ? 200 : 503).json({ status: healthy ? 'ok' : 'degraded', checks });
});
Define alert rules for critical conditions:
# alerts.yml (Prometheus/Grafana format)