Monitor use when deploying monitoring stacks including Prometheus, Grafana, and Datadog. Trigger with phrases like "deploy monitoring stack", "setup prometheus", "configure grafana", or "install datadog agent". Generates production-ready configurations with metric collection, visualization dashboards, and alerting rules.
Deploy production monitoring stacks (Prometheus + Grafana, Datadog, or Victoria Metrics) with metric collection, custom dashboards, and alerting rules. Configure exporters, scrape targets, recording rules, and notification channels for comprehensive infrastructure and application observability.
/metrics, node exporters)helm install kube-prometheus-stack prometheus-community/kube-prometheus-stackprometheus.yml: define job names, scrape intervals, and relabeling rules for service discovery| Error | Cause | Solution |
|---|---|---|
No data points in dashboard | Scrape target not reachable or metric name wrong | Check Targets page in Prometheus UI; verify service discovery and metric name |
Too many time series (high cardinality) | Labels with unbounded values (user IDs, request IDs) | Remove high-cardinality labels with metric_relabel_configs; use recording rules for aggregation |
Alert condition met but no notification | Alertmanager routing or receiver misconfigured | Verify Alertmanager config with amtool check-config; test receiver with amtool silence |
Prometheus OOMKilled | Insufficient memory for series count | Increase memory limits; reduce scrape targets or retention; add WAL compression |
Grafana datasource connection failed | Wrong Prometheus URL or network policy blocking access | Verify datasource URL in Grafana; check Kubernetes service name and port; review network policies |
使用 Arthas 的 watch/trace 获取 EagleEye traceId / 获取请求的 traceId