Diagnoses Kubernetes issues and provides remediation steps. Use when pods are failing, services are unreachable, or cluster health is degraded.
This skill provides systematic approaches to diagnosing and resolving common Kubernetes issues including pod failures, networking problems, and resource constraints.
# Check pod status across all namespaces
kubectl get pods -A | grep -v Running
# View recent events sorted by time
kubectl get events --sort-by='.lastTimestamp' | tail -20
# Check node health
kubectl get nodes
kubectl top nodes
# Describe problematic pod
kubectl describe pod <pod-name> -n <namespace>
# Check pod logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous # Previous container
# Check resource usage
kubectl top pod <pod-name> -n <namespace>
Symptoms: Pod repeatedly crashes and restarts
Diagnostic Steps:
kubectl logs <pod> --previouskubectl describe pod <pod>kubectl get events --field-selector involvedObject.name=<pod>Common Causes & Fixes:
| Cause | Indicator | Fix |
|---|---|---|
| OOMKilled | Exit code 137 | Increase memory limits |
| Missing config | ConfigMap/Secret errors | Verify CM/Secret exists |
| Failed probes | Liveness probe failed | Adjust probe thresholds |
| App crash | Application error in logs | Fix application code |
| Bad command | Error starting container | Verify command/args |
Symptoms: Container image cannot be pulled
Diagnostic Steps:
docker pull <image>Fixes:
# Check secret exists
kubectl get secret <registry-secret> -n <namespace>
# Create registry secret
kubectl create secret docker-registry regcred \
--docker-server=<registry> \
--docker-username=<user> \
--docker-password=<password>
# Verify pod has imagePullSecrets
kubectl get pod <pod> -o jsonpath='{.spec.imagePullSecrets}'
Symptoms: Pod stuck in Pending, not scheduled
Diagnostic Steps:
kubectl describe nodes | grep -A 5 "Allocated resources"kubectl get pvc -n <namespace>kubectl describe node <node> | grep TaintCommon Causes:
| Cause | Check | Fix |
|---|---|---|
| Insufficient CPU/Memory | kubectl describe node | Add nodes or reduce requests |
| PVC not bound | kubectl get pvc | Check storage class |
| Node selector miss | Pod spec nodeSelector | Update selector or label nodes |
| Taint not tolerated | Node taints | Add toleration to pod |
Symptoms: Service returns no endpoints, traffic not reaching pods
Diagnostic Steps:
# Check endpoints
kubectl get endpoints <service> -n <namespace>
# Verify pod labels match selector
kubectl get pods -n <namespace> --show-labels
kubectl get svc <service> -n <namespace> -o jsonpath='{.spec.selector}'
Fix: Ensure pod labels match service selector exactly.
Symptoms: Pods cannot resolve service names
Diagnostic Steps:
# Test DNS from pod
kubectl exec -it <pod> -- nslookup kubernetes.default
kubectl exec -it <pod> -- cat /etc/resolv.conf
# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
Symptoms: Container killed due to memory limits
Diagnostic Steps:
# Check container status
kubectl describe pod <pod> | grep -A 10 "Last State"
# Check memory usage
kubectl top pod <pod>
Fix: