Diagnose and fix Single Step Instrumentation (SSI) issues on Kubernetes — SSI automatically instruments applications for APM without code changes. Only use if the agent and SSI are already configured but traces are missing or instrumentation is not working.
Invoke this skill when the user expresses intent to:
verify-ssiDo NOT invoke this skill if:
enable-ssi firstkubectl config current-contextpup --version
If not found:
brew tap datadog-labs/pack
brew install pup
Check auth:
pup auth status
If not authenticated:
pup auth login
Confirm with pup auth status. If no browser available: export DD_APP_KEY=<your-app-key>.
| Variable | How to resolve |
|---|---|
AGENT_NAMESPACE | Namespace where Datadog Agent is installed |
APP_NAMESPACE | Namespace of the application with missing traces |
CLUSTER_NAME | kubectl config current-context or spec.global.clusterName in datadog-agent.yaml |
SERVICE_NAME | tags.datadoghq.com/service label on the Deployment, or ask the user |
ENV | tags.datadoghq.com/env label on the Deployment, or ask the user |
POD_NAME | kubectl get pods -n <APP_NAMESPACE> — use the specific pod the user mentioned |
DEPLOYMENT_NAME | Check metadata.name in the Deployment manifest, or ask the user |
APP_LABEL | Check spec.selector.matchLabels.app in the Deployment manifest |
Read this before investigating. It gives you the mental model to reason about novel failures, not just known ones.
Injection chain:
datadog-lib-<language>-init init containerLD_PRELOAD env var is set pointing to the library .so fileLD_PRELOADWhat each diagnostic layer can see:
What healthy looks like:
pup fleet instrumented-pods list shows the pod with correct language/versionpup fleet tracers list shows the service as activekubectl get pod -o jsonpath='{.spec.initContainers[*].name}' includes datadog-lib-<language>-initKnown silent failures — SSI produces no error when these occur:
LD_PRELOAD fails silently. SSI's .so is compiled against glibc; musl (Alpine Linux) is ABI-incompatibleadmission.datadoghq.com/enabled: "false" annotation — webhook skips the pod entirelyReasoning shortcuts:
Run all four simultaneously. Everything after this is driven by what you find here.
pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 5
pup fleet instrumented-pods list <CLUSTER_NAME>
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> \
-o jsonpath='{.spec.initContainers[*].name}'
kubectl describe pod <POD_NAME> -n <APP_NAMESPACE> | grep -A 10 "Events:"
Before investigating, explicitly state your ranked hypotheses based on triage output. Do not skip this step.
| Triage signal | Strong hypothesis |
|---|---|
| Traces arriving + pod in instrumented list | Not a real problem — likely a UI filter or time window. Tell the user and stop |
| No traces + pod NOT in instrumented list + no init container | Injection never happened — investigate: namespace targeting, webhook, pod-selector, opt-out annotation, pod not restarted |
| No traces + pod NOT in instrumented list + init container present | Injection attempted but failed — check pup apm troubleshooting list for injection errors |
| No traces + pod in instrumented list + init container present | Tracer injected but not reporting — investigate: connectivity, DD_SITE, API key |
| Pod events show CrashLoopBackOff or init container errors | Init container failure — check libc (Alpine/musl), existing ddtrace, runtime version |
| Traces arriving but wrong service/env | UST labels missing or misconfigured on the Deployment |
State your top 1-3 hypotheses explicitly: "Based on triage, I think the most likely cause is X because Y."
Use only the tools relevant to your hypotheses. Each observation informs your next action.
Is the pod in the Agent namespace? SSI never instruments pods in the same namespace as the Datadog Agent.
kubectl get pods -n <AGENT_NAMESPACE>
Were pods restarted after SSI was enabled?
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod -l app=<APP_LABEL> -n <APP_NAMESPACE> --timeout=120s
pup fleet instrumented-pods list <CLUSTER_NAME>
Is namespace targeting filtering the pod out?
kubectl get datadogagent datadog -n <AGENT_NAMESPACE> -o yaml | grep -A 15 instrumentation
Fix: update enabledNamespaces in datadog-agent.yaml.
kubectl apply -f datadog-agent.yaml
Is a podSelector target filtering the pod out?
If targets with podSelector is configured, only pods whose labels match the selector are instrumented. Check whether the app pod's labels match any target:
kubectl get datadogagent datadog -n <AGENT_NAMESPACE> -o yaml | grep -A 20 targets
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> --show-labels
Fix: add a matching label to the pod template, or broaden the podSelector, then apply and restart.
Is a pod annotation opting it out?
admission.datadoghq.com/enabled: "false" tells the webhook to skip this pod.
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> -o yaml | grep -A 5 annotations
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> --show-labels
Fix: remove the annotation from the Deployment pod template, then apply and restart.
kubectl apply -f <your-app-deployment.yaml>
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
Does the app have existing custom instrumentation? SSI silently disables itself when it detects existing tracer code. Scan source files for:
import ddtrace, ddtrace.patch_all()require('dd-trace'), DD.init()GlobalTracer.register(, dd-java-agentTracer.Instance, DD.Tracerequire 'ddtrace', Datadog.configureDDTrace\Also check dependency manifests: requirements.txt, package.json, Gemfile, pom.xml.
Fix: remove the import/package, rebuild image, reload into cluster, restart pod.
Is the base image Alpine (musl libc)? SSI's injected library requires glibc. Alpine uses musl — ABI-incompatible, fails silently.
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- sh -c "ldd --version 2>&1 | head -1"
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- sh -c "cat /etc/os-release | grep -i 'ID\|NAME' | head -3"
Fix: rebuild with a glibc-based image (python:3.x-slim, node:x-bookworm, eclipse-temurin).
Is the runtime version supported?
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- python --version
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- node --version
kubectl exec -n <APP_NAMESPACE> <POD_NAME> -- java -version
Verify against SSI compatibility matrix.
Is the admission webhook registered?
kubectl get mutatingwebhookconfigurations | grep datadog
kubectl get pods -n <AGENT_NAMESPACE> -l app=datadog-cluster-agent
kubectl logs -n <AGENT_NAMESPACE> -l app=datadog-cluster-agent --tail=100
Did injection produce errors? Get the node hostname first, then query Datadog for injection errors:
kubectl get pod <POD_NAME> -n <APP_NAMESPACE> -o jsonpath='{.spec.nodeName}'
pup apm troubleshooting list --hostname <NODE_HOSTNAME> --timeframe 1h
Is the Agent sending data to Datadog?
kubectl exec -n <AGENT_NAMESPACE> \
$(kubectl get pod -n <AGENT_NAMESPACE> -l app=datadog-agent -o name | head -1) \
-- agent status | grep -A 5 "APM Agent"
Is the tracer reporting?
pup fleet tracers list --filter "service:<SERVICE_NAME>"
Does APM recognise the service?
pup apm services list --env <ENV>
Are traces arriving?
pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 10
Which agent is the tracer connected to? Use if connectivity between tracer and Agent is suspected.
pup fleet agents list --filter "hostname:<NODE_HOSTNAME>"
pup fleet agents tracers <AGENT_KEY> --filter "service:<SERVICE_NAME>"
Before applying any fix, answer:
If the conclusion doesn't hold up, return to Step 2 with new hypotheses. Keep iterating until you can defend the conclusion against all three questions.
Apply the fix for the confirmed root cause. If the fix requires a code or Dockerfile change, rebuild and reload:
docker build -f <DOCKERFILE_PATH> -t <IMAGE_NAME> <BUILD_CONTEXT>
[DECISION: cluster type]
kind load docker-image <IMAGE_NAME> --name <CLUSTER_NAME>
kubectl rollout restart deployment/<DEPLOYMENT_NAME> -n <APP_NAMESPACE>
kubectl wait --for=condition=Ready pod -l app=<APP_LABEL> -n <APP_NAMESPACE> --timeout=120s
Re-run triage to confirm the fix worked:
pup traces search --query "service:<SERVICE_NAME>" --from 1h --limit 5
pup fleet instrumented-pods list <CLUSTER_NAME>
If traces are arriving and the pod is in the instrumented list — resolved. Automatically proceed to onboarding-summary now — do not ask the user for permission.
ERROR: Still not resolved — return to Step 2 with the new triage data and form updated hypotheses.
kubectl delete without user confirmationadmissionController settings directlydocker push to a registry always requires user confirmation