Grafana observability stack for self-hosted Kubernetes using kube-prometheus-stack. This skill should be used when deploying Grafana/Prometheus/Alertmanager via Helm, creating Grafana dashboards (JSON models, panels, variables, PromQL/LogQL queries), provisioning dashboards via ConfigMap sidecars or Git/ArgoCD, configuring datasources (Prometheus, Loki, Tempo, PostgreSQL), creating ServiceMonitors/PodMonitors for scraping app metrics, building cluster health dashboards, Ceph storage monitoring, Traefik ingress metrics, app-specific dashboards, integrating Grafana authentication with Authentik OAuth2/OIDC, designing multi-signal dashboards with collapsible rows (metrics/logs/traces), configuring bidirectional Loki-Tempo cross-signal correlation, setting up OTEL Collector spanmetrics connectors for trace-derived RED metrics, using datasource template variables for portable dashboards, or delivering dashboards via dedicated Helm charts with .Files.Get.
Full observability stack for self-hosted Kubernetes via kube-prometheus-stack Helm chart. Bundles Grafana, Prometheus, Alertmanager, node-exporter, kube-state-metrics, and default recording/alerting rules.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
-f values.yaml -n monitoring --create-namespace
For ArgoCD app-of-apps pattern, Helm values structure, CRD management, and namespace configuration, see kube-prometheus-stack.md.
Loki (logs), Tempo (traces) deploy as separate Helm charts in the same namespace:
grafana/loki - log aggregation (Promtail/Alloy as agent)grafana/tempo - distributed tracingBuild Grafana dashboards: JSON model structure, panel types (timeseries, stat, gauge, table, logs, heatmap), template variables, PromQL/LogQL query patterns.
grafana_dashboard: "1")Configure Prometheus, Loki, Tempo, and PostgreSQL datasources.
additionalDataSources) or ConfigMap sidecarExpose application metrics for Prometheus scraping.
SSO login via Authentik OAuth2/OIDC provider.
auth.generic_oauth configuration via Helm valuesDetailed panel layouts, PromQL/LogQL queries, and metric references per domain:
| Dashboard | Reference |
|---|---|
| Kubernetes cluster health | dashboards/cluster-health.md |
| Ceph storage monitoring | dashboards/ceph-storage.md |
| Traefik ingress/networking | dashboards/traefik-networking.md |
| CNPG, ArgoCD, Authentik | dashboards/applications.md |
| LogQL syntax reference | dashboards/logs-logql.md |
| Log dashboards, alerting, app queries | dashboards/logs-dashboards.md |
ServerSideApply=true and separate CRD app or crds.enabled: truegrafana_dashboard: "1" for sidecar pickupgrafana_datasource: "1"prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues: falsepersistence.enabled: true with appropriate StorageClass for sqlite DB (dashboards provisioned from ConfigMaps survive restarts regardless)