Name: Building With Observability
Author: imsanghaar

搵技能.../

Building With Observability | Skills Pool

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   App Pod   │     │ Prometheus  │     │  Grafana    │
│  /metrics   │◄────│   Scrape    │────►│  Dashboard  │
└─────────────┘     └─────────────┘     └─────────────┘
       │                  │
       ▼                  ▼
  ServiceMonitor     PrometheusRule
  (what to scrape)   (alerting rules)

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   FastAPI   │     │    OTel     │     │   Jaeger    │
│   + OTel    │────►│  Collector  │────►│    UI       │
│   SDK       │     │   (OTLP)    │     │             │
└─────────────┘     └─────────────┘     └─────────────┘

Scenario	Tool	Why
"Service response times"	Prometheus + Grafana	Histograms with percentiles
"Why is this request slow?"	Jaeger traces	See full request path
"What happened at 3am?"	Loki logs	Event-level detail
"Are we meeting SLOs?"	Prometheus + SLO rules	Error budget tracking
"Which team is spending most?"	OpenCost	Cost allocation by namespace

Is it customer-impacting?
├── Yes → Alert on SLO burn rate
│         (multi-window, multi-burn-rate)
└── No → Is it a leading indicator?
         ├── Yes → Warning alert, page if trend continues
         └── No → Dashboard only, no alert

# Add Helm repos
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack (includes Grafana)
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set grafana.adminPassword=admin

apiVersion: monitoring.coreos.com/v1

Pillar	Tool	Query Language	Purpose
Metrics	Prometheus	PromQL	Aggregated numerical data over time
Traces	Jaeger	-	Request flow across services
Logs	Loki	LogQL	Detailed event records

Service Type	Typical SLO	Error Budget (30 days)
User-facing API	99.9%	43.2 minutes
Internal service	99.5%	3.6 hours
Batch jobs	99.0%	7.2 hours

Building With Observability

Building Observability Stacks for Kubernetes

Persona

When to Use This Skill

Core Concepts

Building With Observability

Building Observability Stacks for Kubernetes

Persona

When to Use This Skill

Core Concepts

The Three Pillars of Observability

Prometheus Metrics Architecture

OpenTelemetry Tracing Architecture

Decision Logic

When to Use Each Tool

Alerting Strategy Decision Tree

SLO Target Selection

Workflow: Full Stack Setup

1. Install Prometheus + Grafana Stack

2. Create ServiceMonitor for Your App

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid