Manage the homelab monitoring stack at ~/docker/stacks/monitoring/. Services: Prometheus, Grafana, Alertmanager (Telegram via DeLoNETBot), cadvisor, node-exporter, process-exporter, Loki, OTEL collector, Dockge, Uptime Kuma, health-monitor. Use when: (1) adding, editing, or debugging Prometheus alert rules, (2) managing or restarting monitoring services, (3) checking alert delivery or Telegram bot status, (4) diagnosing system performance issues (CPU hogs, memory bloat, swap pressure), (5) adding new scrape targets to Prometheus, (6) configuring Grafana dashboards or datasources, (7) any task referencing "monitoring", "alerts", "prometheus", "grafana", "cadvisor", "process-exporter", "alertmanager", "telegram alerts", or "DeLoNETBot".
Stack root: ~/docker/stacks/monitoring/
See references/architecture.md for full service map, ports, URLs, and data flow.
cd ~/docker/stacks/monitoring && docker compose up -d
cd ~/docker/stacks/monitoring && docker compose up -d <service-name>
curl -s -X POST http://localhost:9472/-/reload
curl -s -X POST http://localhost:9784/-/reload
curl -s http://localhost:9256/metrics | grep namedprocess | head -5
import urllib.request, json
url = "https://api.telegram.org/bot<TOKEN>/sendMessage"
data = json.dumps({"chat_id": 7564050286, "text": "Test alert", "parse_mode": "HTML"}).encode()
req = urllib.request.Request(url, data=data, headers={"Content-Type": "application/json"})
urllib.request.urlopen(req)
Bot token is in ~/docker/stacks/monitoring/alertmanager/config.yml.
See references/alert-rules-guide.md for rule file locations, PromQL patterns, severity conventions, and examples.
Quick path: Edit the appropriate rule file, then hot-reload Prometheus:
curl -s -X POST http://localhost:9472/-/reload
| File | Scope |
|---|---|
prometheus/alert_rules.yml | Container CPU, memory, availability |
prometheus/system_alerts.yml | Host system + per-process hogs |
prometheus/rules/docker-health.yml | Container health checks, restart loops |
Alerts route to Telegram via @DeLoNETBot (chat_id: 7564050286).
Config: alertmanager/config.yml
Severity routing:
critical: repeat every 1hwarning: repeat every 4hdocker-health.yml only has health-check and restart-loop rules. CPU/memory rules live in alert_rules.yml.When the system stutters, check in this order:
ps aux --sort=-%cpu | head -15 - find CPU hogsfree -h - check swap pressuresensors - check thermals (Tctl)docker stats --no-stream - find container hogsThe alert rules should catch most of these automatically now. If they don't fire, check:
curl http://localhost:9472/-/healthycurl http://localhost:9784/-/healthycurl http://localhost:9472/api/v1/rules | python3 -m json.tool | head -40