Grafana Lens | Skills Pool

grafana_query

grafana_query_logs

"What is [metric]?" / "Why did it spike?" → grafana_explain_metric
"What's the current value of X?" / complex PromQL → grafana_query
"Find error logs" / "Search logs for..." → grafana_query_logs
"Find slow traces" / "Show trace for session X" / "Debug distributed spans" → grafana_query_traces
"Debug this session" / "Why did it fail?" / "What went wrong?" → grafana_query_traces (search error/slow) → grafana_query_traces (get → follow correlationHint) → grafana_query_logs → grafana_query → grafana_annotate
"Show me a chart" / "Visualize..." → grafana_search → grafana_get_dashboard → grafana_share_dashboard
"Create a dashboard for..." → grafana_search (check duplicates) → grafana_create_dashboard
"Add a panel to my dashboard" → grafana_get_dashboard → grafana_update_dashboard
"Delete this dashboard" → grafana_update_dashboard with operation delete (confirm with user first)
"Alert me when..." → grafana_check_alerts (setup) → grafana_create_alert
"List my alert rules" / "What alerts do I have?" → grafana_check_alerts with action list_rules
"Delete alert rule X" → grafana_check_alerts with action list_rules → delete_rule with ruleUid
"Track my [custom data]" / "Record my [past data]" → grafana_push_metrics (with optional timestamp for historical data, auto-registers, returns queryNames) → grafana_query with queryNames
"What data sources do I have?" → grafana_explore_datasources
"What metrics are available?" → grafana_list_metrics
"Set up monitoring" / "Monitor my agent" / "What dashboards should I have?" → grafana_search (check existing) → grafana_create_dashboard with llm-command-center → follow suggestedNext chain through remaining templates
"GenAI observability" / "OTel gen_ai metrics" / "Standard AI monitoring" → grafana_create_dashboard with genai-observability template
"What happened in session X?" / "Debug this session" → grafana_create_dashboard with session-explorer template → paste session ID
"Show me LLM traces" / "Show agent logs" → grafana_create_dashboard with llm-command-center template (Loki + Tempo)
"How much am I spending?" / "Cost analysis" → grafana_create_dashboard with cost-intelligence template
"Which tools are slow?" / "Tool errors" → grafana_create_dashboard with tool-performance template
"Queue health" / "Webhook issues" / "Stuck sessions" → grafana_create_dashboard with sre-operations template
"System health check" / "Status report" / "Review all dashboards" → grafana_explore_datasources → grafana_check_alerts (list + list_rules) → grafana_search → grafana_get_dashboard (audit=true for each) → summarize
"Audit my dashboard" / "Which panels are broken?" → grafana_get_dashboard (audit=true) → review auditSummary + per-panel health
"Am I being attacked?" / "Security check" / "Security status" → grafana_security_check
"Set up security monitoring" → grafana_check_alerts (setup) → grafana_create_dashboard (security-overview) → grafana_create_alert (webhook error burst, cost spike, tool loops, injection signals)
"Investigate security alert" → grafana_security_check → grafana_query_logs (correlate) → grafana_annotate (mark investigation) → grafana_check_alerts (silence)
"Investigate this alert" / "Why is X broken?" / "Debug this issue" / "Triage" / "Root cause" → grafana_investigate (multi-signal triage) → follow suggestedHypotheses.testWith for deep-dives
"Is this metric normal?" / "Is there an anomaly?" → grafana_explain_metric (returns anomaly z-score + seasonality vs 1d/7d ago for 24h period)
"RED analysis" / "What's the error rate?" / "Service health" → RED Method queries (see sre-investigation.md §2)
"Alert fatigue" / "Which alerts are noisy?" / "Alert health" → grafana_check_alerts with action analyze — fatigue report
"Postmortem" / "Incident summary" / "What happened?" → grafana_investigate → 5-Phase methodology → postmortem template (see sre-investigation.md §9)
"Compare before/after deployment" → grafana_annotate (list, tags: ["deploy"]) → grafana_explain_metric (compareWith: "previous")

"Monitor a service/database/app" → alloy_pipeline action recipes (filter by category) → select recipe → create → status → query → dashboard → alert
"Scrape metrics from [endpoint]" / "My app exposes /metrics" → alloy_pipeline with recipe scrape-endpoint + params { url }
"Monitor PostgreSQL/MySQL/Redis/MongoDB/Memcached" → alloy_pipeline with recipe [db]-exporter + params { connectionString }
"Collect and parse logs with JSON extraction" → alloy_pipeline (log recipe + processing params: jsonExpressions, labelFields, structuredMetadata)
"Collect Docker logs" / "See container logs in Grafana" → alloy_pipeline with recipe docker-logs
"Tail log files" / "Collect app logs from /var/log" → alloy_pipeline with recipe file-logs + params { paths }
"Accept logs via HTTP push API" / "Centralized log gateway" → alloy_pipeline with recipe loki-push-api
"Consume logs from Kafka" → alloy_pipeline with recipe kafka-logs + params { brokers, topics }
"Set up syslog collection" → alloy_pipeline with recipe syslog
"Monitor endpoint availability" / "Synthetic probing" / "HTTP health checks" → alloy_pipeline with recipe blackbox-exporter + params { targets }
"Kubernetes monitoring" / "Monitor my K8s cluster" → alloy_pipeline with recipe kubernetes-pods + kubernetes-services + kubernetes-logs (3 pipelines)
"Receive OTLP data" / "Set up trace collection" → alloy_pipeline with recipe otlp-receiver
"Generate RED metrics from traces" / "Span metrics" → alloy_pipeline with recipe span-metrics
"Service dependency graph from traces" → alloy_pipeline with recipe service-graph
"Monitor Alloy itself" / "Self-monitoring" → alloy_pipeline with recipe self-monitoring
"Redact secrets from logs" / "Compliance logging" → alloy_pipeline with recipe secret-filter-logs + params { paths }
"Monitor Elasticsearch/Kafka" → alloy_pipeline with recipe elasticsearch-exporter / kafka-exporter
"System metrics" / "Node monitoring" / "CPU/memory/disk" → alloy_pipeline with recipe node-exporter
"Docker container metrics" / "Container resource usage" → alloy_pipeline with recipe docker-metrics
"Reduce trace costs" / "Keep only error traces" / "Smart trace sampling" / "Tail sampling" → alloy_pipeline with recipe application-traces + samplingPolicies array (keep errors, keep slow, filter health checks, sample rest)
"Multi-tenant Loki" / "Route logs by tenant" / "Different tenants for different apps" → any log recipe + tenantValue or matchRoutes processing param
"Profile my app" / "CPU profiling" / "Memory profiling" / "Continuous profiling" / "Go pprof" → alloy_pipeline with recipe continuous-profiling + targets
"Frontend observability" / "Browser RUM" / "Web vitals" / "Faro SDK" → alloy_pipeline with recipe faro-frontend
"GELF logs" / "Graylog" / "Docker GELF driver" → alloy_pipeline with recipe gelf-logs
"Custom Alloy pattern" / "Advanced pipeline" → Read references/alloy-components.md → alloy_pipeline with raw config + optional sampleQueries
"What data collection recipes are available?" → alloy_pipeline with action recipes
"What pipelines do I have?" / "Pipeline list" → alloy_pipeline with action list
"Is my pipeline working?" / "Pipeline health" → alloy_pipeline with action status + name
"Pipeline problems" / "Why isn't data showing up?" → alloy_pipeline with action diagnose → follow remediation
"Delete pipeline" / "Remove monitoring for..." → alloy_pipeline with action delete + name (confirm with user first)

Tool	What It Does
`grafana_explore_datasources`	Discover configured datasources (UIDs, types, query routing) — tells you which tool + query language to use for each datasource
`grafana_list_metrics`	Discover available metrics or label values from a datasource. Use `compact: true` with `metadata: true` for minimal fields in multi-tool chains
`grafana_query`	Run PromQL instant/range queries — get numbers directly
`grafana_query_logs`	Run LogQL queries against Loki — search and filter logs
`grafana_query_traces`	Run TraceQL queries against Tempo — search traces or get full trace by ID
`grafana_create_dashboard`	Create dashboards from templates or custom JSON
`grafana_update_dashboard`	Add/remove/update panels, change dashboard metadata, or delete dashboard
`grafana_get_dashboard`	Get dashboard summary (panels, queries). Use `compact: true` for overview scans, `audit: true` to health-check all panels in one call
`grafana_search`	Search existing dashboards by title, tags, or starred status
`grafana_share_dashboard`	Render panel as image and deliver inline via messaging
`grafana_create_alert`	Create Grafana-native alert rules on any metric
`grafana_annotate`	Create or list annotations (events) on dashboards
`grafana_check_alerts`	Check, acknowledge, list/delete rules, silence/unsilence, or set up Grafana alert webhook notifications. Use `compact: true` with `list_rules` for minimal fields
`grafana_push_metrics`	Push custom data (calendar, git, fitness, finance) via OTLP
`grafana_explain_metric`	Get metric context: current value, trend, stats, metadata, drill-down queries — agent interprets
`grafana_security_check`	Run 6 parallel security checks and return threat-level assessment (green/yellow/red) — "Am I being attacked?"
`grafana_investigate`	Multi-signal investigation triage — gathers metrics, logs, traces, and context in parallel, generates hypotheses with specific tool+params for follow-up
`alloy_pipeline`	Create and manage Alloy data collection pipelines — 29 recipes for metrics, logs, traces, profiles from any infrastructure (databases, K8s, Docker, apps, profiling, frontend RUM)

When: User asks "what metrics are available?" or you need to discover metrics before querying or composing dashboards. Also when grouping metrics by function — metadata mode adds category to each openclaw_* metric. Use purpose when user asks about a specific concern (e.g., "performance metrics", "cost metrics"). Params: datasourceUid (required), prefix (filter by prefix), search (targeted discovery — server-side regex, only matching metrics returned), purpose ("performance" | "cost" | "reliability" | "capacity" — pre-filter by intent, composable with prefix and search), label (list label values instead), metadata (boolean — enriched results with type/help/category), compact (boolean — with metadata, returns only name/type/category, ~60% smaller). Example names: { "datasourceUid": "prom1", "prefix": "openclaw_lens_" } Example search: { "datasourceUid": "prom1", "search": "steps" } Example purpose: { "datasourceUid": "prom1", "purpose": "performance", "metadata": true } Example combined: { "datasourceUid": "prom1", "prefix": "openclaw_ext_", "search": "fitness" } Example metadata: { "datasourceUid": "prom1", "metadata": true, "prefix": "openclaw_" } Example compact: { "datasourceUid": "prom1", "metadata": true, "compact": true } Returns names: { metrics: ["metric1", "metric2", ...] }. Truncated at 200. Returns metadata: { metadataSource, categorySummary: { cost: 3, usage: 4, session: 5, ... }, metrics: [{ name, type, help, category?, source? }, ...] }. Use this before composing custom dashboards — type tells you counter vs gauge vs histogram, category groups openclaw_* metrics by function. Search also matches help text. Categories: cost, usage, session, queue, messaging, webhook, tools, agent, custom. categorySummary gives counts per category for quick overview (omitted when no openclaw_* metrics). Purpose maps: performance → session + tools, cost → cost + usage, reliability → webhook + messaging + agent, capacity → queue + session. metadataSource: "prometheus" when Prometheus metadata endpoint has data, "synthetic" when OTLP-only (metadata synthesized from known metric registry — histogram sub-metrics deduplicated, type/help from Grafana Lens definitions). On OTLP stacks, includes hint explaining why metadata is synthetic. source: "synthetic" on individual entries from the registry; source: "custom" on entries from the custom metrics store. Returns compact: { metadataSource, categorySummary: {...}, metrics: [{ name, type, category? }, ...] }. Same as metadata but drops help, source, labelNames — use in multi-tool chains where you need metric names and types but not full descriptions. Example label: { "datasourceUid": "prom1", "label": "job" } Returns label: { label, count, totalCount, values: ["value1", "value2", ...] }. Truncated at 200.

When: User asks a data question that needs a direct answer, not a dashboard. Also for re-running an existing dashboard panel's query with different time ranges. Params: datasourceUid, expr (PromQL), queryType (instant/range), start (range only, required), end (range only, default "now"), step (range only, optional — auto-calculated from time range if omitted, targeting ~300 datapoints), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example instant: { "datasourceUid": "prom1", "expr": "sum(increase(openclaw_lens_cost_by_model_total[1d])) or vector(0)" } Example range (auto-step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-30d" } Example range (explicit step): { "datasourceUid": "prom1", "expr": "rate(openclaw_tokens_total[5m])", "queryType": "range", "start": "now-1h", "end": "now", "step": "60" } Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 10, "queryType": "range", "start": "now-7d" } Tip: start/end accept Unix seconds or relative expressions like "now-1h", "now-7d". For range queries, just set start — end defaults to "now" and step is auto-calculated. Override step only when you need specific resolution. Tip (panel re-run): Set dashboardUid + panelId to re-run a panel's query without manually extracting PromQL. The tool auto-resolves expr and datasourceUid from the panel definition. Template variables are replaced with wildcards. You can still override expr or datasourceUid explicitly if needed. Get panel IDs from grafana_get_dashboard. Returns instant: { metrics: [{ metric: {...}, value: "1.23", timestamp: "...", healthContext?: { status, thresholds, description, direction } }], datasourceUid, resultCount, warnings?, hint? } — healthContext is included for well-known openclaw_lens_* gauge metrics, providing SRE-grade health assessment: status ("healthy"/"warning"/"critical"), thresholds (warning/critical values), description (what the metric means), direction ("higher_is_worse"/"lower_is_worse"). Omitted for unknown metrics. Capped at 50 results; when exceeded includes truncated: true, totalResults, and truncationHint advising to narrow the query. Returns range: { series: [{ metric: {...}, values: [{ time, value }...] }], datasourceUid, resultCount, warnings?, hint? } — truncated to 20 points per series and 50 series max. When series are truncated includes truncated: true, totalSeries, and truncationHint. When step is auto-calculated, includes step: { value: "288s", display: "5m", auto: true }. Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal query results. If the panel uses a Loki datasource, returns an error directing you to use grafana_query_logs instead. Returns (warnings): When Prometheus flags a non-fatal issue (e.g., rate() on a gauge), warnings: [{ cause, suggestion, example? }] is included. Example: rate() on a gauge → cause says "rate() applied to 'metric' which appears to be a gauge", suggestion says "use delta() or deriv() instead", example shows the corrected query. Returns (hint): When the query returns zero results, hint: { cause, suggestion } explains why (metric may not exist, label filters may not match) and suggests using grafana_list_metrics to verify. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common PromQL mistakes: unclosed parenthesis, missing range selector, timeout, auth failure, rate on gauge, etc. Omitted when the error is unrecognized. Tip (chaining): Both instant and range responses include datasourceUid — pass it directly to grafana_create_alert or other tools without re-calling grafana_explore_datasources. This enables zero-friction query→alert chains.

When: User asks about logs, errors, or needs to investigate issues by searching log data. Also for session debugging, OTel log investigation, and re-running existing log panel queries. Params: datasourceUid, expr (LogQL), queryType (instant/range, default range), start/end (default now-1h/now), step (metric queries only), limit (default 100), direction (backward/forward), lineLimit (max chars per log line, default 500, max 2000), extractFields (boolean, default false — extract structured OTel attributes into a clean fields object), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example log search: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"error\"" } Example with filters: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |~ \"timeout|refused\"", "limit": 50, "direction": "forward" } Example full stack traces: { "datasourceUid": "loki1", "expr": "{job=\"api\"} |= \"Exception\"", "lineLimit": 2000 } Example session debugging: { "datasourceUid": "loki1", "expr": "{service_name=\"openclaw\"} | json | component=\"lifecycle\"", "extractFields": true } Example metric query: { "datasourceUid": "loki1", "expr": "rate({job=\"api\"}[5m])", "queryType": "range", "start": "now-6h", "end": "now", "step": "60" } Example panel re-run: { "dashboardUid": "openclaw-command-center", "panelId": 18, "start": "now-24h", "extractFields": true } Returns streams: { entries: [{ labels: {...}, timestamp: "...", line: "..." }], datasourceUid, totalEntries, truncated } — capped at 100 entries, lines at 500 chars (set lineLimit: 2000 for full stack traces). Returns streams (extractFields): { entries: [{ labels: {...cleaned...}, timestamp: "...", line: "...", fields: { component, event_name, session_id, trace_id, model, duration_s, ... } }], datasourceUid } — infrastructure noise labels removed, openclaw_ prefix stripped from field keys, numeric values auto-converted. Also parses JSON log bodies if present. Returns streams (traceCorrelation): When extractFields: true and entries contain trace_id, includes traceCorrelation: { traceIds: [...], tool: "grafana_query_traces", tip } — up to 5 unique trace IDs ready for grafana_query_traces with queryType: "get". Returns metric: Same shape as grafana_query range/instant results (matrix capped at 50 series, vector capped at 50 results — includes datasourceUid, truncated, totalSeries/totalResults, and truncationHint when exceeded). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus datasource, returns an error directing you to use grafana_query instead. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common LogQL mistakes: bare text without stream selector, empty {}, unclosed braces, missing label matchers, auth failure, timeout. Omitted when the error is unrecognized. Tip: LogQL: {label="value"} selects streams, |= substring filter, |~ regex, != exclude. Metric wrappers: rate(), count_over_time(), bytes_rate(). Use extractFields: true when investigating OTel/lifecycle logs — it surfaces trace_id, session_id, event_name, model, and other attributes as first-class fields instead of buried in raw labels. Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve LogQL and datasource. The tool routes Prometheus panels to grafana_query with a helpful error.

When: User asks about traces, distributed tracing, slow spans, session trace hierarchies, or needs to debug request flows across services. Params: datasourceUid, query (TraceQL expression or trace ID), queryType (search/get, default search), start/end (default now-1h/now), limit (default 20, max 50), minDuration/maxDuration (e.g., "1s", "10s"), dashboardUid (optional — resolve query from panel), panelId (optional — use with dashboardUid). Example search: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }" } Example search slow: { "datasourceUid": "tempo1", "query": "{ resource.service.name = \"openclaw\" }", "minDuration": "5s" } Example search with time: { "datasourceUid": "tempo1", "query": "{ span.gen_ai.system = \"anthropic\" }", "start": "now-24h", "limit": 50 } Example get: { "datasourceUid": "tempo1", "query": "abc123def456789...", "queryType": "get" } Example panel re-run: { "dashboardUid": "openclaw-session-explorer", "panelId": 12, "start": "now-24h" } Returns search: { traces: [{ traceId, rootServiceName, rootTraceName, startTime, durationMs, spanCount? }], datasourceUid, totalTraces, truncated?, correlationHint? } — capped at 50 traces. When exceeded includes truncated: true and truncationHint. When traces are found, includes correlationHint: { logQuery, tool, tip } with a ready-to-use LogQL expression for grafana_query_logs. Returns get: { traceId, spans: [{ traceId, spanId, parentSpanId?, operationName, serviceName, startTime, durationMs, status, kind?, attributes: {...} }], datasourceUid, totalSpans, truncated? } — flattened OTLP spans with resolved attributes (string/number/boolean). Capped at 200 spans. Sorted by start time (earliest first). Returns (panel re-run): Includes resolvedFrom: "panel", panelTitle, panelType, templateVarsReplaced alongside normal results. If the panel uses a Prometheus or Loki datasource, returns an error directing you to use the correct tool. Returns (error with guidance): On query failure, includes guidance: { cause, suggestion, example? } alongside the raw error. Pattern-matched for common TraceQL mistakes: syntax errors, invalid attributes, auth failure, timeout, not-found, invalid trace ID. Omitted when the error is unrecognized. Returns (no results): When search returns zero traces, includes hint: { cause, suggestion } suggesting to broaden the query or check the datasource. Tip: TraceQL: { } matches all traces, resource.service.name for service filter, span.http.status_code for HTTP spans, name for operation name, duration for span duration, status for error/ok filtering. Use minDuration/maxDuration to find performance outliers. Trace-to-Log: search and get results include correlationHint.logQuery — pass it directly to grafana_query_logs to find correlated logs. Log-to-Trace: grafana_query_logs results (with extractFields: true) include traceCorrelation.traceIds — pass any ID to grafana_query_traces with queryType: "get". Tip (panel re-run): Same as grafana_query — set dashboardUid + panelId to auto-resolve TraceQL and datasource. The tool routes Prometheus/Loki panels to the correct tool with a helpful error.

Template	Tier	Domain	Variables	Use When
`llm-command-center`	Tier 1	System overview	`$prometheus`, `$loki`, `$tempo`, `$provider`, `$model`, `$channel`	Golden signals, session table with click-to-drill-down, cost, cache, live feeds
`session-explorer`	Tier 2	Session debug	`$prometheus`, `$loki`, `$tempo`, `$session` (textbox)	Per-session trace hierarchy, LLM calls, tool calls, conversation flow
`cost-intelligence`	Tier 3a	Cost analysis	`$prometheus`, `$loki`, `$provider`, `$model`	Spending trends, model attribution, cache savings, per-session cost table
`tool-performance`	Tier 3b	Tool analytics	`$prometheus`, `$loki`, `$tempo`, `$tool`	Tool leaderboard, latency ranking, error rates, tool traces
`sre-operations`	Tier 3c	SRE operations	`$prometheus`, `$loki`	Queue health, webhooks, stuck sessions, tool loops
`genai-observability`	—	OTel gen_ai standard	`$prometheus`, `$loki`, `$tempo`, `$model`, `$provider`	Industry-standard AI monitoring: token analytics, LLM performance, traces, logs, cache efficiency. Works with any gen_ai data.
`node-exporter`	—	System/DevOps	`$datasource`, `$instance`	Server CPU, memory, disk, network
`http-service`	—	Web/DevOps	`$datasource`, `$job`	HTTP request rate, errors, latency (RED signals)
`metric-explorer`	—	Any domain	`$datasource`, `$metric`	Deep-dive into any single metric from a dropdown
`multi-kpi`	—	Any domain	`$datasource`, `$metric1`..`$metric4`	4-metric KPI overview (business, fitness, finance, IoT)
`weekly-review`	—	Any domain	`$datasource`, `$metric1`, `$metric2`	Weekly overview of 2 external metrics with trends + all openclaw_ext_* table

When: User wants to add a panel, remove a panel, change a query, update dashboard settings, or delete a dashboard. Params: uid (required), operation (required: add_panel, remove_panel, update_panel, update_metadata, delete). add_panel params: panel (object with title, type, targets). Auto-layouts below existing panels. remove_panel / update_panel params: panelId (preferred) or panelTitle (case-insensitive substring fallback). updates (object) for update_panel. update_metadata params: title, description, tags, time (e.g., { "from": "now-7d", "to": "now" }), refresh (e.g., "1m"). delete params: None besides uid — permanently removes the dashboard. Always confirm with user first. Example add: { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Error Rate", "type": "timeseries", "targets": [{ "refId": "A", "expr": "rate(errors_total[5m])", "datasource": { "uid": "prom1" } }] } } Example add (no datasource): { "uid": "abc123", "operation": "add_panel", "panel": { "title": "Latency", "type": "timeseries", "targets": [{ "refId": "A", "expr": "histogram_quantile(0.99, rate(http_duration_bucket[5m]))" }] } } — validation skipped if no datasource UID found, panel still saved. Example remove: { "uid": "abc123", "operation": "remove_panel", "panelId": 3 } Example update panel: { "uid": "abc123", "operation": "update_panel", "panelId": 1, "updates": { "title": "New Title", "targets": [{ "refId": "A", "expr": "new_query" }] } } Example update metadata: { "uid": "abc123", "operation": "update_metadata", "title": "My Dashboard v2", "time": { "from": "now-7d", "to": "now" }, "refresh": "5m" } Example delete: { "uid": "abc123", "operation": "delete" } Returns update: { status: "updated", uid, url, version, operation, panelCount, affectedPanel?: { id, title }, changedFields?: [...], queryValidation?: { validated, results, datasourceUid?, skippedReason? } }. Returns queryValidation: For add_panel and update_panel (when targets change), PromQL queries are dry-run against Grafana. Each result: { refId, expr, valid: boolean, error?: string, sampleValue?: number }. Panel is always saved — validation is informational. If valid: false, check the error field for PromQL syntax issues. If skippedReason is set, no datasource UID was found — include datasource: { uid: "..." } on targets to enable validation. Returns delete: { status: "deleted", uid, title, message }. Tip: targets in update_panel replaces entirely — include all targets, not just changed ones. Include datasource.uid on targets for query validation feedback.

When: Prompt context shows "GRAFANA ALERTS", need to manage alert rules (list/delete), set up the alert webhook, silence alerts during investigation, or acknowledge an investigated alert. Params: action ("list" default, "acknowledge", "list_rules", "delete_rule", "silence", "unsilence", "setup"). List params: None — returns all pending (unacknowledged) alerts. Instances capped at 5 per alert. Acknowledge params: alertId (required) — marks an alert as investigated. List rules params: compact (boolean, default false — returns only uid/title/state/condition). Full mode returns all configured alert rules from Grafana with UID, title, condition (PromQL), folder, labels, annotations, AND live evaluation state (normal/firing/pending/nodata/error), health, and lastEvaluation. One call gives the complete alert health picture. Delete rule params: ruleUid (required) — permanently deletes an alert rule. Get UIDs from list_rules. Silence params: matchers (required — array of { name, value, isRegex? } from alert's commonLabels), duration (default "2h"), comment (optional). Unsilence params: silenceId (required) — removes a silence so alerts resume notifying. Setup params: webhookUrl (optional, auto-detected) — creates webhook contact point and notification policy route in Grafana. Example list: {} Example acknowledge: { "action": "acknowledge", "alertId": "alert-1" } Example list rules: { "action": "list_rules" } Example list rules compact: { "action": "list_rules", "compact": true } Example delete rule: { "action": "delete_rule", "ruleUid": "abc123-def456" } Example silence: { "action": "silence", "matchers": [{ "name": "alertname", "value": "HighCost" }], "duration": "2h", "comment": "Investigating cost spike" } Example unsilence: { "action": "unsilence", "silenceId": "silence-uuid-123" } Example setup: { "action": "setup" } Returns list: { status: "success", alertCount, alerts: [{ id, status, title, message, receivedAt, commonLabels, totalInstances, truncated?, suggestedInvestigation?: { datasourceUid, condition, tool, queryLanguage, hint }, instances: [{ status, labels, annotations, startsAt, values }] }] }. suggestedInvestigation is auto-enriched by matching the alert to its rule — provides the PromQL/LogQL expression, datasource, and tool to use for immediate investigation (eliminates the need for separate list_rules + explore_datasources calls). Returns acknowledge: { status: "acknowledged", alertId }. Returns list_rules: { status: "success", ruleCount, rules: [{ uid, title, folder, ruleGroup, state, health, lastEvaluation, for, labels, annotations, condition, updated }] }. state is the live evaluation state: "normal" (not firing), "firing", "pending" (within for duration), "nodata", or "error". Falls back to "unknown" if the eval state API is unavailable. health is "ok", "nodata", "error", or "unknown". condition is the extracted PromQL expression from the rule's data queries. Returns list_rules (compact): { status: "success", ruleCount, rules: [{ uid, title, state, condition }] }. Minimal fields for multi-tool chains — use when you need a quick overview of all rules without details. Returns delete_rule: { status: "deleted", ruleUid, message }. Returns silence: { status: "silenced", silenceId, duration, matchers, message }. Returns unsilence: { status: "unsilenced", silenceId, message }. Returns setup: { status: "created", contactPointUid, webhookUrl } or { status: "already_exists", contactPointUid }. Note: Setup is idempotent — safe to call multiple times. Only alerts with managed_by=openclaw label route to the webhook (auto-added by grafana_create_alert). Use list_rules → delete_rule for full alert lifecycle management (create via grafana_create_alert, list/delete via grafana_check_alerts).

When: User wants to track custom data (calendar events, git commits, fitness stats, financial data) in Grafana. Params: action ("push" default, "register", "list", "delete"). Push params: metrics (required array) — each: { name, value, labels?, type?, help?, timestamp? }. Names auto-get openclaw_ext_ prefix. timestamp is optional ISO 8601 for historical data (gauge only). Register params: name (required), type ("gauge"/"counter", default "gauge"), help, labelNames (array), ttlDays. List params: None — returns all custom metric definitions. Delete params: name (required) — removes a custom metric. Example push: { "metrics": [{ "name": "steps_today", "value": 8000 }, { "name": "meetings", "value": 3, "labels": { "type": "standup" } }] } Example backfill: { "metrics": [{ "name": "steps", "value": 8000, "timestamp": "2025-01-15" }, { "name": "steps", "value": 10500, "timestamp": "2025-01-16" }] } Example mixed: { "metrics": [{ "name": "steps", "value": 9000, "timestamp": "2025-01-17" }, { "name": "heart_rate", "value": 72 }] } Example register: { "action": "register", "name": "weight_kg", "type": "gauge", "help": "Body weight", "labelNames": ["person"], "ttlDays": 90 } Example list: { "action": "list" } Example delete: { "action": "delete", "name": "old_metric" } Returns push: { status: "ok", accepted: 2, queryNames: { "openclaw_ext_steps": "openclaw_ext_steps", "openclaw_ext_events": "openclaw_ext_events_total" }, suggestedWorkflow: [{ tool, action, example }], message: "..." }. suggestedWorkflow contains concrete next-step examples using the actual pushed metric names — verify (grafana_query), visualize (grafana_create_dashboard with metric-explorer template), and alert (grafana_create_alert, single-metric only). Partial success supported. Timestamped and real-time points in the same batch are both accepted. Returns register: { status: "registered", metric: { name, type, help, labelNames, ttlMs }, queryName: "openclaw_ext_events_total", suggestedWorkflow: [{ tool, action, example }] }. suggestedWorkflow shows how to push data and query the registered metric (with rate() wrapping for counters). Returns list: { count, metrics: [{ name, type, queryName, help, labelNames, createdAt, updatedAt }] }. Returns delete: { status: "deleted", name }. Note: Push auto-registers unknown metrics. Response includes queryNames with exact PromQL names and suggestedWorkflow with concrete next steps. Follow suggestedWorkflow to complete the push→visualize pipeline. Timestamped pushes are gauge-only — counters with timestamps are rejected. See external-data.md for naming conventions and backfill patterns.

When: User asks "what does this metric mean?", "why did it spike?", "is this normal?", or "show me the trend". Params: datasourceUid (required), expr (PromQL or plain metric name, required), period (24h/7d/30d, default 24h), compareWith ("previous" — compare current period with the same-length window immediately before it). Example: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd" } Example counter: { "datasourceUid": "prom1", "expr": "openclaw_lens_tokens_total" } Example 7d: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d" } Example comparison: { "datasourceUid": "prom1", "expr": "openclaw_lens_daily_cost_usd", "period": "7d", "compareWith": "previous" } Example PromQL: { "datasourceUid": "prom1", "expr": "rate(http_requests_total[5m])", "period": "24h" } Returns: { metricType?, trendQuery?, current: { value, timestamp }, healthContext?: { status, thresholds, description, direction }, trend: { changePercent, direction, first, last }, stats: { min, max, avg, samples }, comparison?: { previousPeriod: { from, to, avg, min, max, samples }, change: { absolute, percentage, direction } }, metadata: { type, help, unit }, suggestedQueries?: [{ query, description }], suggestedBreakdowns?: string[] }. Sections omitted when data unavailable. changePercent is null when first value is zero. healthContext is included for well-known openclaw_lens_* gauge metrics — same as grafana_query. Counter-aware: Auto-detects counter metrics (via metadata type or _total suffix) and wraps the trend query in rate(expr[5m]). The current value stays raw (cumulative total), but trend and stats show rate of change. metricType field tells you the detected type (counter/gauge/histogram). trendQuery shows the actual PromQL used for trend (only present when different from expr). Drill-down: For multi-dimensional metrics (metrics with labels like model, token_type, provider), the response includes suggestedQueries — ready-to-use PromQL queries for grafana_query that break down the metric by each label. Counter metrics get rate() wrapping automatically. Use these to investigate cost attribution, identify top contributors, or decompose aggregates. Breakdowns: suggestedBreakdowns provides label names for decomposition — always available for known OpenClaw metrics (cost, session, queue, webhook families) even when the metric has no data yet. For unknown metrics, falls back to labels discovered from the instant query. Use these labels with grafana_query to build sum by (label) (...) queries for root-cause analysis. Period comparison: Use compareWith: "previous" for period-over-period analysis (e.g., this week vs. last week). Returns a comparison object with the previous period's stats and the change (absolute, percentage, direction). Works with counters too (compares rates). Eliminates the need for manual multi-query workflows. Tip: For simple trend context, call with just period. For "did things improve?" questions, add compareWith: "previous". Metadata only available for plain metric names (not complex PromQL). No need to manually wrap counters in rate() — the tool does it automatically.

When: Trigger whenever the user mentions data collection, metrics pipeline, log forwarding, trace collection, infrastructure monitoring, database monitoring, observability setup, scraping endpoints, collecting Docker/container logs, Kubernetes monitoring, syslog, or any mention of getting data into Grafana. Also trigger for managing existing pipelines (status, health, diagnostics). 26 recipes covering metrics (11), logs (8), traces (4), profiles (1), and infrastructure (3). This is the bridge between "I have a data source" and "I can see it in Grafana" — if the user has data somewhere and wants it in Grafana, this tool is the answer. Actions: create (deploy pipeline from recipe), list (show managed pipelines), update (change params), delete (remove pipeline), recipes (browse catalog), status (check health), diagnose (Alloy connectivity + all pipelines). Params (create with recipe): recipe (required), params (recipe-specific), name (optional — auto-generated from recipe name with numeric suffix for uniqueness, e.g., syslog, syslog-2, syslog-3; always read the name field from the response to get the actual assigned name). Log recipes also accept optional processing params: jsonExpressions (object — JSON path extractions), labelFields (object — promote fields to labels), structuredMetadata (object — Loki structured metadata), staticLabels (object), timestampSource (string), outputSource (string), regexExpression (string). Params (create with raw config): config (raw Alloy River syntax), name (required), signal (optional — "metrics", "logs", "traces", or "profiles", auto-detected from component IDs if omitted), sampleQueries (optional — object with metrics/logs/traces keys containing ready-to-use queries). Response includes exportTargets showing where data is being sent. Params (update): name (required), params (fields to change — merges with existing). Raw-config pipelines support update via full config replacement — pass config param with the complete new config. Recipe-based pipelines use params for partial updates (merges with existing). Response includes sampleQueries and suggestedWorkflow for chaining into query/dashboard tools. Params (status): name (required). Params (recipes): category (optional filter). Example create (scrape): { "recipe": "scrape-endpoint", "params": { "url": "http://myapp:8080/metrics" } } Example create (database): { "recipe": "postgres-exporter", "params": { "connectionString": "postgres://user:pass@db:5432/mydb" }, "name": "analytics-db" } Example create (logs): { "recipe": "docker-logs", "params": { "containerNames": ["myapp", "nginx"] } } Example create (files): { "recipe": "file-logs", "params": { "paths": ["/var/log/app/*.log"] } } Example create (K8s): { "recipe": "kubernetes-pods", "params": { "namespaces": ["production"] } } Example create (OTLP): { "recipe": "otlp-receiver" } Example create (traces with sampling): { "recipe": "application-traces", "params": { "environment": "staging", "sampleRate": 0.5 } } Example create (syslog UDP): { "recipe": "syslog", "params": { "protocol": "udp", "listenAddress": "0.0.0.0:5514" } } Example create (raw config): { "config": "prometheus.scrape ...", "name": "custom-pipeline", "signal": "metrics" } Example recipes: { "action": "recipes", "category": "logs" } Example status: { "action": "status", "name": "analytics-db" } Example diagnose: { "action": "diagnose" } Returns (create): { status, name, recipe, signal, configFile, reloaded, envVarsRequired, warnings, sampleQueries: { metrics|logs|traces: {...} }, suggestedWorkflow }. status is "created" (active) or "pending_credentials" (env vars not set yet). sampleQueries are ready-to-use PromQL/LogQL/TraceQL — pass directly to query tools. envVarsRequired lists env vars for credential-bearing recipes. warnings contains advisory notes about unrecognized params. Raw-config pipelines may also include credentialWarnings if plaintext credentials were detected. Returns (recipes): { categories, recipes: [{ name, category, signal, summary, requiredParams, optionalParams, hasCredentials, dashboardTemplate }] }. Use summary to match user intent. Returns (status): { name, status, components: [{ id, health, detail }], dataVerification: { verifyQuery, tool }, remediation }. Pending pipelines auto-promote to active when components become healthy. Returns (diagnose): { alloyConnectivity, pipelineHealth, driftDetected, orphanFiles, limits }. Tip (credential lifecycle): Credential recipes (postgres-exporter, mysql-exporter, mongodb-exporter, redis-exporter, scrape-endpoint with auth, kafka-logs) may return pending_credentials status when env vars aren't set. Config stays on disk — tell user the exact env var names from envVarsRequired, then verify with action status. Pipeline auto-activates once Alloy can connect. Tip (chaining): sampleQueries → grafana_query/grafana_query_logs. dashboardTemplate → grafana_create_dashboard. Zero-friction pipeline→query→dashboard→alert chains. Tip (multi-pipeline): For K8s: kubernetes-pods + kubernetes-logs + otlp-receiver (3 separate creates). Tip (raw config escape hatch): Use config param for patterns not covered by recipes. Signal is auto-detected from component IDs (or set signal explicitly). Add sampleQueries so chaining tools know what to query. Component IDs are auto-extracted for health checking. Supports update via full config replacement. See references/alloy-components.md for copy-pasteable component snippets. Tip (port conflicts): Trace recipes (otlp-receiver, application-traces, span-metrics, service-graph) default to ports 4317/4318 — creating multiple on the same ports will fail. Set grpcPort/httpPort params to different values for each. Tip (cross-pipeline isolation): Pipelines are self-contained — each .alloy file has no cross-file references. For combined patterns (e.g., OTLP receiver + span-metrics + service-graph in one config), use raw config param with the full combined pipeline. You cannot wire one pipeline's output into another pipeline.

Grafana Lens

Musts

Grafana Lens

Musts

Quick Decision Tree

Data Collection Pipelines (Alloy)

Working with Multiple Grafana Instances

Tool Inventory

Tool Details

grafana_explore_datasources

grafana_list_metrics

grafana_query

grafana_query_logs

grafana_query_traces

grafana_create_dashboard

grafana_update_dashboard

grafana_get_dashboard

grafana_search

grafana_share_dashboard

grafana_create_alert

grafana_annotate

grafana_check_alerts

grafana_push_metrics

grafana_explain_metric

grafana_security_check

grafana_investigate

alloy_pipeline

Security Monitoring

Composed Workflow Examples

Dashboard Composition

Data Collection Pipeline Workflows (Alloy)

Agent Metrics

External Data

SRE Investigation Patterns

Alloy Pipeline Recipes

Bluebubbles

Add Tracing

Analytics Events

Add Expert

Arthas

Arthas Eagleeye Traceid

`grafana_explore_datasources`

`grafana_list_metrics`

`grafana_query`

`grafana_query_logs`

`grafana_query_traces`

`grafana_create_dashboard`

`grafana_update_dashboard`

`grafana_get_dashboard`

`grafana_search`

`grafana_share_dashboard`

`grafana_create_alert`

`grafana_annotate`

`grafana_check_alerts`

`grafana_push_metrics`

`grafana_explain_metric`

`grafana_security_check`

`grafana_investigate`

`alloy_pipeline`