Use when Pi needs to troubleshoot logs/metrics/traces or verify LGTM (Grafana/Loki/Mimir/Tempo) access in local k3s, cloud Kubernetes, or any generic K8s setup.
Use this skill whenever the user asks to:
This skill supports:
In Nebari Pi environments, the common architecture is:
data-science),monitoring,pi-observability-proxy and pi-debug CLI for curated debugging.pi-debug logs|events|rollout|doctorpi-observability-proxy.Do not assume only one path exists. Detect and use what works.
Run:
kubectl config current-context
kubectl get ns
Then detect whether pi-debug exists:
command -v pi-debug >/dev/null && echo "pi-debug available" || echo "pi-debug not available"
If pi-debug is available, test quickly:
pi-debug doctor --app pi || true
Preferred discovery:
kubectl get svc -A | grep -Ei 'grafana|loki|mimir|tempo|prometheus'
Export discovered endpoints (examples):
export GRAFANA_SVC="http://<grafana-svc>.<ns>.svc.cluster.local:80"
export LOKI_URL="http://<loki-svc>.<ns>.svc.cluster.local:3100"
export MIMIR_URL="http://<mimir-gateway-svc>.<ns>.svc.cluster.local"
export TEMPO_URL="http://<tempo-svc>.<ns>.svc.cluster.local:3200"
export PROM_URL="http://<prometheus-svc>.<ns>.svc.cluster.local:9090"
For Nebari local defaults, likely names are:
lgtm-pack-grafana.monitoring.svc.cluster.locallgtm-pack-loki.monitoring.svc.cluster.locallgtm-pack-mimir-gateway.monitoring.svc.cluster.locallgtm-pack-tempo.monitoring.svc.cluster.localcurl -fsS "${GRAFANA_SVC}/login" >/dev/null && echo grafana_ok
curl -fsS "${LOKI_URL}/ready" && echo
curl -fsS "${MIMIR_URL}/ready" && echo
curl -fsS "${PROM_URL}/-/ready" && echo
If any fail, capture exact HTTP/code/error and continue with Kubernetes diagnostics.
If the deployment allows broad direct access, use Loki directly.
curl -sG "${LOKI_URL}/loki/api/v1/query_range" \
--data-urlencode 'query={namespace="data-science"}' \
--data-urlencode 'limit=200' \
--data-urlencode 'direction=BACKWARD'
curl -sG "${LOKI_URL}/loki/api/v1/query_range" \
--data-urlencode 'query={job=~".+"}' \
--data-urlencode 'limit=200' \
--data-urlencode 'direction=BACKWARD'
curl -sG "${LOKI_URL}/loki/api/v1/query_range" \
--data-urlencode 'query={namespace="<ns>",pod=~"<pod-prefix>.*"}' \
--data-urlencode 'limit=200' \
--data-urlencode 'direction=BACKWARD'
If pi-debug exists, you can also run:
pi-debug logs --app pi --since 30m
pi-debug events --app pi
pi-debug rollout --app pi
pi-debug doctor --app pi
Always include these checks when user says "deployment not working":
kubectl get pods -A
kubectl get events -A --sort-by=.lastTimestamp | tail -n 120
kubectl get deploy -A
Then correlate with Loki query by namespace/pod label.
Look specifically for:
ImagePullBackOff, ErrImagePullCrashLoopBackOffFailedScheduling)If a public hostname/path is expected:
export GRAFANA_URL="https://<host>/monitoring"
curl -k -fsS "${GRAFANA_URL}/api/health"
If this fails but in-cluster service works, focus on ingress/gateway/route/auth layer.
kubectl get pods -A | grep -Ei 'grafana|loki|mimir|tempo|prometheus|otel'
kubectl get svc -A | grep -Ei 'grafana|loki|mimir|tempo|prometheus|otel'
kubectl logs -n monitoring ds/opentelemetry-collector-agent --tail=120 || true
kubectl get networkpolicy -A
Always report:
pi-debug, direct in-cluster, or port-forward/public URL).references/lgtm-queries.md for reusable queries.