Set up Prometheus and Grafana monitoring for AQUA vLLM model deployments on OCI. Covers the signing proxy, container registry setup, OCI Container Instance deployment, and PromQL dashboards. Triggered when user wants to monitor LLM deployments, view TTFT/latency/throughput metrics, or set up observability for AQUA.
Monitor vLLM model deployments with Prometheus + Grafana hosted on an OCI Container Instance. The monitoring stack consists of:
/metrics endpointAll standard vLLM Prometheus metrics are available:
| Metric | Description |
|---|---|
vllm:time_to_first_token_seconds | TTFT histogram |
vllm:inter_token_latency_seconds | ITL histogram |
vllm:e2e_request_latency_seconds | End-to-end request latency |
vllm:num_requests_running |
| Concurrent requests in flight |
vllm:num_requests_waiting | Requests queued |
vllm:gpu_cache_usage_perc | KV cache utilization |
vllm:num_tokens_prompt | Prompt token count |
vllm:num_tokens_generation | Generation token count |
vllm:request_success_total | Successful request count |
AQUA Model Deployment
└── /predict/metrics endpoint (requires OCI IAM signature)
↑
Signing Proxy :8080
(resource_principal auth)
↑
Prometheus :9090
(scrapes localhost:8080 every 5s)
↑
Grafana :3000
(visualizes from localhost:9090)
↑
User browser (public IP of Container Instance)
git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
cd oci-data-science-ai-samples/ai-quick-actions/aqua_metrics
The directory contains:
signing_proxy/ — OCI-aware auth proxy (Dockerfile)prometheus/ — Prometheus config + Dockerfilegrafana/ — Grafana DockerfileReplace <registry-domain> with your region's OCIR endpoint (e.g., iad.ocir.io) and <tenancy-namespace> with your tenancy namespace.
cd signing_proxy
docker build --no-cache -t signing_proxy .
docker tag signing_proxy <registry-domain>/<tenancy-namespace>/signing_proxy
docker push <registry-domain>/<tenancy-namespace>/signing_proxy:latest
The prometheus/prometheus.yml is preconfigured to scrape localhost:8080 (the proxy):