Name: Overview
Author: micsapp

Overview

Deploy production monitoring stacks (Prometheus + Grafana, Datadog, or Victoria Metrics) with metric collection, custom dashboards, and alerting rules. Configure exporters, scrape targets, recording rules, and notification channels for comprehensive infrastructure and application observability.

Prerequisites

Target infrastructure identified: Kubernetes cluster, Docker hosts, or bare-metal servers
Metric endpoints accessible from the monitoring platform (application /metrics, node exporters)
Storage backend capacity planned for time-series data (Prometheus TSDB, Thanos, or Cortex for long-term)
Alert notification channels defined: Slack webhook, PagerDuty integration key, or email SMTP
Helm 3+ for Kubernetes deployments using kube-prometheus-stack or similar charts

Instructions

Select the monitoring platform: Prometheus + Grafana for open-source self-hosted, Datadog for managed SaaS, Victoria Metrics for high-cardinality workloads
Deploy the monitoring stack: or Docker Compose for non-Kubernetes

Error	Cause	Solution
`No data points in dashboard`	Scrape target not reachable or metric name wrong	Check `Targets` page in Prometheus UI; verify service discovery and metric name
`Too many time series (high cardinality)`	Labels with unbounded values (user IDs, request IDs)	Remove high-cardinality labels with `metric_relabel_configs`; use recording rules for aggregation
`Alert condition met but no notification`	Alertmanager routing or receiver misconfigured	Verify Alertmanager config with `amtool check-config`; test receiver with `amtool silence`
`Prometheus OOMKilled`	Insufficient memory for series count	Increase memory limits; reduce scrape targets or retention; add WAL compression
`Grafana datasource connection failed`	Wrong Prometheus URL or network policy blocking access	Verify datasource URL in Grafana; check Kubernetes service name and port; review network policies

Overview

Overview

Overview

Prerequisites

Instructions

Output

Error Handling

Examples

Resources

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid