Name: Troubleshoot Ssi
Author: datadog-labs

Troubleshoot Ssi | Skills Pool

pup --version

brew tap datadog-labs/pack
brew install pup

pup auth status

pup auth login

Variable	How to resolve
`AGENT_NAMESPACE`	Namespace where Datadog Agent is installed
`APP_NAMESPACE`	Namespace of the application with missing traces
`CLUSTER_NAME`	`kubectl config current-context` or `spec.global.clusterName` in `datadog-agent.yaml`
`SERVICE_NAME`	`tags.datadoghq.com/service` label on the Deployment, or ask the user
`ENV`	`tags.datadoghq.com/env` label on the Deployment, or ask the user
`POD_NAME`	`kubectl get pods -n <APP_NAMESPACE>` — use the specific pod the user mentioned
`DEPLOYMENT_NAME`	Check `metadata.name` in the Deployment manifest, or ask the user
`APP_LABEL`	Check `spec.selector.matchLabels.app` in the Deployment manifest

pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 5
pup fleet instrumented-pods list <CLUSTER_NAME>
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> \
  -o jsonpath='{.spec.initContainers[*].name}'
kubectl describe pod <POD_NAME> -n <APP_NAMESPACE> | grep -A 10 "Events:"

Triage signal	Strong hypothesis
Traces arriving + pod in instrumented list	Not a real problem — likely a UI filter or time window. Tell the user and stop
No traces + pod NOT in instrumented list + no init container	Injection never happened — investigate: namespace targeting, webhook, pod-selector, opt-out annotation, pod not restarted
No traces + pod NOT in instrumented list + init container present	Injection attempted but failed — check `pup apm troubleshooting list` for injection errors
No traces + pod in instrumented list + init container present	Tracer injected but not reporting — investigate: connectivity, DD_SITE, API key
Pod events show CrashLoopBackOff or init container errors	Init container failure — check libc (Alpine/musl), existing ddtrace, runtime version
Traces arriving but wrong service/env	UST labels missing or misconfigured on the Deployment

kubectl get pods -n <AGENT_NAMESPACE>

kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod -l app=<APP_LABEL> -n <APP_NAMESPACE> --timeout=120s
pup fleet instrumented-pods list <CLUSTER_NAME>

kubectl get datadogagent datadog -n <AGENT_NAMESPACE> -o yaml | grep -A 15 instrumentation

kubectl apply -f datadog-agent.yaml

kubectl get datadogagent datadog -n <AGENT_NAMESPACE> -o yaml | grep -A 20 targets
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> --show-labels

kubectl get pod <POD_NAME> -n <APP_NAMESPACE> -o yaml | grep -A 5 annotations
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> --show-labels

kubectl apply -f <your-app-deployment.yaml>
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>

kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- sh -c "ldd --version 2>&1 | head -1"
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- sh -c "cat /etc/os-release | grep -i 'ID\|NAME' | head -3"

kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- python --version
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- node --version
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- java -version

kubectl get mutatingwebhookconfigurations | grep datadog
kubectl get pods -n <AGENT_NAMESPACE> -l app=datadog-cluster-agent
kubectl logs -n <AGENT_NAMESPACE> -l app=datadog-cluster-agent --tail=100

kubectl get pod <POD_NAME> -n <APP_NAMESPACE> -o jsonpath='{.spec.nodeName}'
pup apm troubleshooting list --hostname <NODE_HOSTNAME> --timeframe 1h

kubectl exec -n <AGENT_NAMESPACE> \
  $(kubectl get pod -n <AGENT_NAMESPACE> -l app=datadog-agent -o name | head -1) \
  -- agent status | grep -A 5 "APM Agent"

pup fleet tracers list --filter "service:<SERVICE_NAME>"

pup apm services list --env <ENV>

pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 10

pup fleet agents list --filter "hostname:<NODE_HOSTNAME>"
pup fleet agents tracers <AGENT_KEY> --filter "service:<SERVICE_NAME>"

docker build -f <DOCKERFILE_PATH> -t <IMAGE_NAME> <BUILD_CONTEXT>

kind load docker-image <IMAGE_NAME> --name <CLUSTER_NAME>

kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod -l app=<APP_LABEL> -n <APP_NAMESPACE> --timeout=120s

pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 5
pup fleet instrumented-pods list <CLUSTER_NAME>

Troubleshoot Ssi

Troubleshoot APM SSI on Kubernetes

Triggers

Prerequisites

pup-cli: check, install, and authenticate

Claude runs

Troubleshoot Ssi

Troubleshoot APM SSI on Kubernetes

Triggers

Prerequisites

pup-cli: check, install, and authenticate

Claude runs

Claude runs

What you need to do in a terminal

Context to resolve before acting

How SSI Works — Domain Knowledge

Step 1: Triage

Claude runs

Step 2: State Your Hypotheses

Step 3: Investigate

Cluster-side investigation tools

Claude runs

Claude runs

Datadog-side investigation tools

Step 4: Reflect Before Concluding

Step 5: Fix

Claude runs

Claude runs

Claude runs

Step 6: Verify

Claude runs

Security constraints

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow