Name: Tenant-Aware Metrics
Author: marquesfelip

Tenant-Aware Metrics

Per-tenant metrics monitoring for SaaS systems. Use when: tenant-aware metrics, per-tenant metrics, tenant metric label, tenant Prometheus, tenant Grafana, tenant metric cardinality, tenant counter, tenant histogram, tenant gauge, tenant metric aggregation, tenant usage metric, tenant error rate, tenant latency, multi-tenant observability, tenant metric dashboard, tenant SLO metric, tenant billing metric, metric per tenant, tenant request rate, tenant quota metric, tenant seat metric, tenant storage metric, metric tenant isolation, cardinality explosion per tenant.

marquesfelip0 星標2026年3月29日

職業
分類: 監控

When to Use

Invoke this skill when you need to:

Add tenant_id as a Prometheus label to track error rates, latencies, and request counts per tenant
Design label cardinality safely — avoid one label value per user (cardinality explosion)
Build Grafana dashboards that allow drilling down to a specific tenant
Define SLO metrics at both the platform level and the per-tenant level
Alert when a single tenant drives disproportionate resource usage (noisy neighbor)
Track quota consumption metrics, seat counts, and billing-relevant signals per tenant

Cardinality Strategy

Tenant count    Prometheus label strategy
─────────────   ───────────────────────────────────────────────────────
< 100           Safe to use tenant_id as a label — full cardinality OK
100 – 1 000     Use tenant_id only on critical metrics; aggregate others
> 1 000         Do NOT use tenant_id as a Prometheus label
                → Write per-tenant aggregates to a separate TSDB or DB
                → Use Loki LogQL for per-tenant error queries instead

Tenant-Aware Metrics

marquesfelip0 星標2026年3月29日

職業
分類: 監控

When to Use

Invoke this skill when you need to:

Add tenant_id as a Prometheus label to track error rates, latencies, and request counts per tenant

Design label cardinality safely — avoid one label value per user (cardinality explosion)

Build Grafana dashboards that allow drilling down to a specific tenant

Define SLO metrics at both the platform level and the per-tenant level

Alert when a single tenant drives disproportionate resource usage (noisy neighbor)

Track quota consumption metrics, seat counts, and billing-relevant signals per tenant

Cardinality Strategy

Tenant count Prometheus label strategy ───────────── ─────────────────────────────────────────────────────── < 100 Safe to use tenant_id as a label — full cardinality OK 100 – 1 000 Use tenant_id only on critical metrics; aggregate others > 1 000 Do NOT use tenant_id as a Prometheus label → Write per-tenant aggregates to a separate TSDB or DB → Use Loki LogQL for per-tenant error queries instead

Signal	Metric type	Labels	Cardinality note
Request count	Counter	tenant_id, status_class, endpoint	Safe if < 1000 tenants
Error count	Counter	tenant_id, error_category	Safe if < 1000 tenants
Latency	Histogram	endpoint, method, status_class	No tenant label — use exemplars
Quota usage ratio	Gauge	tenant_id, resource	Safe if < 1000 tenants
Seat count	Gauge	tenant_id	Safe if < 1000 tenants
Storage bytes used	Gauge	tenant_id	Use DB snapshot for scale
Billing events	Counter	plan_id, event_type	No tenant label

Tenant-Aware Metrics

When to Use

Cardinality Strategy

Tenant-Aware Metrics

When to Use

Cardinality Strategy

Step 1 — Metric Registration with Tenant Label

Step 2 — HTTP Middleware: Emit Per-Tenant Metrics

Step 3 — Quota Usage Metrics (Periodic Export)

Step 4 — Large-Scale Alternative: Per-Tenant Aggregates in PostgreSQL

Step 5 — Grafana Dashboard Design

Metric Design Reference

Quality Checks

After Completion

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid