Name: Monitoring Capture Service
Author: PostHog

Monitoring the capture service with Grafana MCP

The capture service (rust/capture/) is PostHog's Rust HTTP ingestion endpoint. It receives events from SDKs, applies quota/rate limits, and produces to Kafka. Three deployment roles run the same binary with different CAPTURE_MODE configs.

This skill teaches how to discover live metrics using the Grafana MCP tools rather than memorizing metric names that change as the code evolves.

Environment context

The Grafana MCP is connected to a single Grafana instance scoped to one environment. If the user hasn't specified, ask which environment they want to investigate:

prod-us — US production (us-east-1)
prod-eu — EU production (eu-central-1)

Most capture app metrics (e.g. capture_*, http_requests_*, envoy_cluster_*) are environment-specific by virtue of which Grafana you're connected to — they don't carry an environment label. MSK and CloudWatch metrics do carry labels but are still scoped to the connected Grafana's AWS account.

Monitoring the capture service with Grafana MCP

This skill teaches how to discover live metrics using the Grafana MCP tools rather than memorizing metric names that change as the code evolves.

Environment context

The Grafana MCP is connected to a single Grafana instance scoped to one environment. If the user hasn't specified, ask which environment they want to investigate:

prod-us — US production (us-east-1)
prod-eu — EU production (eu-central-1)

Domain	Datasource UID	Discovery tool	Scope filter
App metrics (VictoriaMetrics)	`victoriametrics`	`list_prometheus_metric_names`	`regex: "capture_.*"`
App metrics (realtime)	`victoriametrics-realtime`	same	same (lower retention, higher resolution)
Logs	`P44D702D3E93867EC` (Loki-logs)	`list_loki_label_names`	`app=~"capture.*"`
Profiling	`pyroscope`	`list_pyroscope_profile_types`	`service_name="capture/capture"`
Dashboards	n/a	`search_dashboards`	query `"capture"` or `"ingestion"`
CloudWatch (ElastiCache, MSK)	`P034F075C744B399F`	`query_prometheus`	`environment="prod-us"`
CloudWatch Root (prod-us only)	`PAAE47F430CFD1449`	same	root account AWS metrics (does NOT exist in prod-eu)

Role	Pipeline	Notes
`capture`	Main events	Highest volume
`capture-ai`	AI/LLM events	OTel ingestion on port 4318
`capture-replay`	Session recordings	`CAPTURE_MODE=recordings`

Prefix	Domain	Scope label
`capture_*`	App metrics (68 metrics)	`role`
`http_requests_*`	HTTP layer (shared)	`role=~"capture.*"`
`capture_kafka_*`	Kafka producer (17 metrics)	`role`
`capture_billing_*`	Billing/quota tokens loaded	`role`, `cache_key`
`capture_event_restrictions_*`	Event restrictions (6 metrics)	`role`, `restriction_type`
`capture_ai_otel_*`	AI/OTel capture (7 metrics)	`role="capture-ai"`
`envoy_cluster_*`	L7 proxy	`envoy_cluster_name=~"posthog_capture.*"`
`aws_msk_*`	MSK broker-side (JMX)	`environment="prod-us"`
`ratelimit_service_*`	Contour rate limit	`domain="posthog"`
`overflow_redirect_*`	Node.js ingestion overflow (downstream)	`ingestion_pipeline`
`kube_` / `container_`	K8s resources	`namespace="posthog"`, `pod=~"capture.*"`

Topic	Pipeline
`ingestion-events-1024`	Main events
`ingestion-events-overflow-128`	Overflow (rate-limited / high-volume tokens)
`ingestion-events-historical-128`	Historical backfill events
`ingestion-session_replay-main-256`	Session replay
`ingestion-heatmaps-128`	Heatmaps
`ingestion-logs`	Log ingestion
`ingestion-general-turbo-1024`	General turbo
`ingestion-errortracking-main-128`	Error tracking
`client_iwarnings_ingestion`	Client warnings

Service name	Deployment
`capture/capture`	Main capture
`capture/capture-ai`	AI capture
`capture-replay/capture-replay`	Replay capture
`capture-logs/capture-logs`	Logs capture

Monitoring Capture Service

Monitoring the capture service with Grafana MCP

Environment context

Monitoring Capture Service

Monitoring the capture service with Grafana MCP

Environment context

Observability landscape

Stable waypoints

Deployment roles

Envoy cluster naming

Redis instance topology

Metric prefixes

Kafka topics

Pyroscope services

Grafana dashboards

Discovery workflows

Prometheus / VictoriaMetrics

Loki (logs)

Pyroscope (profiling)

Dashboards

Redis / ElastiCache

Key metric domains

Investigation playbooks

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags

UID	Title
`capture`	Main capture dashboard
`ingestion-capture`	Capture-specific ingestion metrics
`ingestion-general`	Cross-service ingestion overview
`ingestion-reliability`	Error rates and reliability signals
`ingestion-pipeline-performance`	End-to-end pipeline latency
`b2348f37-f276-498e-b72e-7cc2b5ec1455`	New capture dashboard
`contour`	Envoy L7 proxy (set `envoy_cluster_name=posthog_capture_3000`)
`envoy-contour-debug`	Envoy/Contour debugging
`qZz6iq9Wx`	AWS MSK Kafka Cluster
`ingestion-session-recordings`	Session Replay ingestion