Build a prometheus-collector CCP test image, deploy it to an AKS standalone, and validate metric scraping and ingestion into Azure Monitor.
This skill verifies that a new CCP component's managed Prometheus addon functions correctly by deploying a prometheus-collector test image into an AKS standalone environment. It automates the full end-to-end flow:
ama-metrics-ccp deployment with the test image.This skill requires the repo. Ask the user for their local checkout path.
If they don't have it:
git clone https://github.com/Azure/prometheus-collector.git <destination>
The standalone creation steps reference the aks-rp repo for aksdev binary and azureconfig.yaml. Ask the user for their local aks-rp checkout path.
If they don't have it:
git clone https://msazure.visualstudio.com/DefaultCollection/CloudNativeCompute/_git/aks-rp <destination>
az) logged in with ADO permissionsaz devops authenticationMSFT-AzVPN-Manual (for aksdev operations)aksdev binary (built or downloaded via the create-standalone skill)kubectl installed| Input | Description | Example |
|---|---|---|
PROM_COLLECTOR_BRANCH | The prometheus-collector branch to test | user/my-ccp-feature |
USER_ALIAS | Your Microsoft alias | dakydd |
STANDALONE_NAME | Name of an existing standalone (if reusing) | standalone-260216bm47nl |
The skill includes a helper script at tools/check_build.py that checks a prometheus-collector pipeline build for the CCP ORAS push stage status and extracts the image tag.
Usage:
python3 tools/check_build.py <build-id>
Build the CCP image using the Azure/prometheus-collector pipeline:
$PROM_COLLECTOR_BRANCH.ORAS Push Artifacts in /mnt/vss/_work/1/a/linuxccp/ stage inside Build: linux CCP prometheus-collector image to succeed.Note: The pipeline can show as failed overall. You can continue as long as the ORAS push stage succeeded.
One-line check command (expects succeeded):
az devops invoke --organization https://github-private.visualstudio.com --area build --resource timeline \
--route-parameters project=azure buildId=<build-id> --api-version 7.1 \
--query "records[?name=='ORAS Push Artifacts in /mnt/vss/_work/1/a/linuxccp/'] | [0].result" -o tsv
mcr.microsoft.com/azuremonitor/containerinsights/cidev/prometheus-collector/images:<TAG>-ccp
Important: Use
cidev(notciprod), and the tag must end with-ccp.
export CCP_IMAGE_TAG="<paste-the-tag-here>"
export TEST_IMAGE="mcr.microsoft.com/azuremonitor/containerinsights/cidev/prometheus-collector/images:${CCP_IMAGE_TAG}"
echo "CCP test image: $TEST_IMAGE"
Create a standalone cluster (follow the create-standalone skill if needed), then download azureconfig.yaml and build or download the aksdev binary.
The standalone's cx-1 underlay is a real AKS cluster, which we leverage to get an MSI token for metric ingestion (see Why cx-1).
export AKS_RESOURCE_ID=/subscriptions/<subscription-id>/resourcegroups/<standalone-resource-group>/providers/Microsoft.ContainerService/managedClusters/<standalone-resource-group>-cx-1
export AKS_CLUSTER_NAME=<standalone-resource-group>-cx-1
export RESOURCE_GROUP=<standalone-resource-group>
export SUBSCRIPTION_ID=<subscription-id>
az account set -s $SUBSCRIPTION_ID
az aks update --enable-azure-monitor-metrics --enable-control-plane-metrics -n $AKS_CLUSTER_NAME -g $RESOURCE_GROUP
# Note: --enable-control-plane-metrics requires the aks-preview CLI extension
az aks get-credentials -n $AKS_CLUSTER_NAME -g $RESOURCE_GROUP -f $AKS_CLUSTER_NAME.kubeconfig
export USER_ALIAS=<your-alias>
export WORKFLOW_NAME=buddybuild-standalone
export CX_CLUSTER_NAME=$USER_ALIAS-$WORKFLOW_NAME
export MC_SUB=82acd5bb-4206-47d4-9c12-a65db028483d
export LOCATION=<standalone-location> # Must match standalone location
./bin/aksdev cluster create $CX_CLUSTER_NAME --location $LOCATION \
--managedclustersubscription $MC_SUB --enableManagedIdentity \
--enable-azure-monitor-metrics \
--subscription-features AzureMonitorMetricsControlPlanePreview \
--node-provisioning-mode Auto
./bin/aksdev cluster kubeconfig $CX_CLUSTER_NAME --managedclustersubscription $MC_SUB > $CX_CLUSTER_NAME.kubeconfig
Scale all reconciler deployments to 0 replicas to prevent them from reverting the deployment patches:
kubectl scale deploy addonconfigreconciler -n addonconfigreconciler --replicas=0 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
kubectl scale deploy overlaymgr-overlaymanager overlaymgr-overlaymanager-loop -n overlaymgr --replicas=0 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
kubectl scale deploy eno-reconciler -n eno-system --replicas=0 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
Note: All four must be scaled down. The
eno-reconcilermanages underlay deployment specs and will scale the other reconcilers back to 1 if left running.
6):kubectl get ns --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
export CCP_NS=<ccp-namespace-id>
export SKIP_CCP_RECONCILE_UNTIL=$(date -u -v+7d '+%Y-%m-%dT%H:%M:%SZ')
kubectl annotate namespace $CCP_NS skip-ccp-reconcile-until-this-time="$SKIP_CCP_RECONCILE_UNTIL" --overwrite --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
Make three updates to the ama-metrics-ccp deployment:
Note:
kubectl set envdoes not work for env vars that usevalueFrom(e.g.,fieldRef). Usekubectl patchwith strategic merge and"valueFrom": null.
kubectl patch deployment ama-metrics-ccp -n $CCP_NS --type=strategic --kubeconfig $AKS_CLUSTER_NAME.kubeconfig \
-p "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"prometheus-collector\",\"env\":[{\"name\":\"CLUSTER\",\"value\":\"$AKS_RESOURCE_ID\",\"valueFrom\":null}]}]}}}}"
Check which container is present:
kubectl get deploy ama-metrics-ccp -n $CCP_NS -o jsonpath='{.spec.template.spec.containers[*].name}' --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
If it has msi-adapter, export the deployment, replace the container block, and apply:
kubectl get deployment ama-metrics-ccp -n $CCP_NS -o yaml --kubeconfig $AKS_CLUSTER_NAME.kubeconfig > /tmp/ama-metrics-ccp.yaml
# Edit /tmp/ama-metrics-ccp.yaml — replace the msi-adapter container with addon-token-adapter
kubectl apply -f /tmp/ama-metrics-ccp.yaml --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
The addon-token-adapter container block to use:
- name: addon-token-adapter
command:
- /addon-token-adapter
args:
- --secret-namespace=kube-system
- --secret-name=aad-msi-auth-token
- --token-server-listening-port=7777
- --health-server-listening-port=9999
- --restart-pod-waiting-minutes-on-broken-connection=240
image: mcr.microsoft.com/aks/msi/addon-token-adapter:master.251201.2
imagePullPolicy: IfNotPresent
env:
- name: AZMON_COLLECT_ENV
value: "false"
livenessProbe:
httpGet:
path: /healthz
port: 9999
initialDelaySeconds: 10
periodSeconds: 60
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 20m
memory: 30Mi
securityContext:
capabilities:
drop:
- ALL
add:
- NET_ADMIN
- NET_RAW
kubectl set image deployment/ama-metrics-ccp -n $CCP_NS prometheus-collector=$TEST_IMAGE --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
Set your CCP component's setting to true in the ConfigMap. For example, for NAP:
controlplane-node-auto-provisioning: true
kubectl logs deploy/ama-metrics-ccp -c prometheus-collector -n $CCP_NS --kubeconfig $AKS_CLUSTER_NAME.kubeconfig | tail -50
Verify that metrics are appearing in the connected Azure Monitor workspace.
With minimalingestionprofile enabled (default), confirm only the metrics in your minimal list are ingested.
Add an additional metric to the keeplist in the ConfigMap and confirm ingestion.
export KUBECONFIG=$CX_CLUSTER_NAME.kubeconfig # overlay
export KUBECONFIG=$AKS_CLUSTER_NAME.kubeconfig # underlay
karpenter_(nodes_created_total|nodes_terminated_total).Delete the test cluster:
./bin/aksdev cluster delete $CX_CLUSTER_NAME --managedclustersubscription $MC_SUB
Scale reconcilers back up (if the standalone is still needed):
kubectl scale deploy addonconfigreconciler -n addonconfigreconciler --replicas=1 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
kubectl scale deploy overlaymgr-overlaymanager overlaymgr-overlaymanager-loop -n overlaymgr --replicas=1 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
kubectl scale deploy eno-reconciler -n eno-system --replicas=1 --kubeconfig $AKS_CLUSTER_NAME.kubeconfig
Or delete the standalone entirely (auto-deletes after 3 days).
In standalone, we use a trick to get the ama-metrics-ccp pod an MSI token for ingestion to a real Azure Monitor workspace. The "cluster" created via aksdev isn't a real AKS cluster — it only exists within the standalone. There's no MSI token available for it.
However, the standalone underlay itself (the cx-1 cluster) is a real AKS cluster. We enable the AMA Metrics addon on cx-1, which gives it permission to ingest to an Azure Monitor workspace. We then configure the ama-metrics-ccp pod to use the cx-1 cluster's resource ID via the CLUSTER env var, which allows the addon-token-adapter to obtain the correct MSI token.
| Document | Location |
|---|---|
| Enabling Managed Prometheus for CCP | ADO Wiki |
| Minimal Prometheus ingestion profile | Microsoft Learn |
| Azure Monitor Metrics enable guide | Microsoft Learn |
| Skip CCP Reconcile | ADO Wiki |