Use and configure the kpi-collector CLI for collecting Prometheus/Thanos metrics from OpenShift clusters. Generates kpis.json files with Telco-specific PromQL queries for RAN, Core, PTP, networking, and resource compliance. Triggered when the user mentions kpi-collector, KPI collection, kpis.json, Telco metrics, PromQL for OpenShift, or Grafana dashboards for cluster monitoring.
Override per-KPI (seconds or duration string like "2m")
run-once
No
false
Collect once at start, skip repeated sampling
query-type
No
"instant"
"instant" or "range"
step
range only
—
Resolution between points (e.g. "30s")
range
range only
—
Lookback window (e.g. "1h")
Range query rules
sample-frequency = how often the collector executes the query
range = how far back each execution looks
step = spacing between data points in the result
PromQL windows like rate(...[5m]) control the per-point lookback independently
If frequency > range, you get data gaps (the tool blocks this with an error)
If frequency < range/2, you get heavy overlap (the tool warns)
Dynamic CPU placeholders
Use {{RESERVED_CPUS}} and {{ISOLATED_CPUS}} in promqueries. They are auto-replaced
with CPU IDs from PerformanceProfile CRs (e.g. "0-1,32-33" becomes "0|1|32|33").
Requires --kubeconfig authentication.
Before Running kpi-collector
Always gather details from the user before running any kpi-collector run command.
Ask for:
Cluster name — identifier for this cluster in the database
Cluster type — ran, core, or hub (optional)
Authentication method:
Kubeconfig — ask if ~/.kube/config is correct, or get a custom path.
Remind the user their kubeconfig credentials must be valid (e.g. oc login first).
Token + Thanos URL — ask for both the bearer token and the Thanos querier URL
(without https:// prefix). Use this when kubeconfig is unavailable or expired.
Run mode — single snapshot (--once) or continuous collection?
If continuous, ask for:
Frequency — how often to sample (default: 60s). Examples: 30s, 2m, 5m
Duration — how long to run (default: 45m). Examples: 1h, 8h, 24h
Database backend — SQLite (default, local) or PostgreSQL?
If PostgreSQL, ask for the connection string.
TLS — if targeting a lab or disconnected cluster, ask whether to skip
TLS verification (--insecure-tls). Default is to verify.
If using the AskQuestion tool, structure it like:
- "How do you want to authenticate?" → ["Kubeconfig (auto-discovery)", "Bearer token + Thanos URL"]
- If kubeconfig: "Use default ~/.kube/config?" → ["Yes", "I'll provide a custom path"]
- If token: ask for token value and Thanos URL
- "Run mode?" → ["Collect once (--once)", "Continuous collection"]
- If continuous: "How often and for how long?" — let user specify or offer defaults
- "Database?" → ["SQLite (default, local file)", "PostgreSQL"]
- "Skip TLS verification?" → ["No (production)", "Yes (lab/self-signed certs)"]
Generating kpis.json for Telco Workloads
When the user asks to create KPIs for a Telco cluster, ask which cluster type they are