Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.
Expert debugging and diagnostics for Kubernetes clusters using kubectl-mcp-server tools.
Use this skill when:
| Priority | Rule | Impact | Tools |
|---|---|---|---|
| 1 | Check pod status first | CRITICAL | get_pods, describe_pod |
| 2 | View recent events | CRITICAL |
get_events| 3 | Inspect logs (including previous) | HIGH | get_pod_logs |
| 4 | Check resource metrics | HIGH | get_pod_metrics |
| 5 | Verify endpoints | MEDIUM | get_endpoints |
| 6 | Review network policies | MEDIUM | get_network_policies |
| 7 | Examine node status | LOW | get_nodes, describe_node |
| Symptom | First Tool | Next Steps |
|---|---|---|
| Pod Pending | describe_pod | Check events, node capacity, resource requests |
| CrashLoopBackOff | get_pod_logs(previous=True) | Check exit code, resources, liveness probes |
| ImagePullBackOff | describe_pod | Verify image name, registry auth, network |
| OOMKilled | get_pod_metrics | Increase memory limits, check for memory leaks |
| ContainerCreating | describe_pod | Check PVC binding, secrets, configmaps |
| Terminating (stuck) | describe_pod | Check finalizers, PDBs, preStop hooks |
1. get_pods(namespace, label_selector) - Get pod status
2. describe_pod(name, namespace) - See events and conditions
3. get_events(namespace, field_selector="involvedObject.name=<pod>") - Check events
4. get_pod_logs(name, namespace, previous=True) - For crash loops
| State | Likely Cause | Tools to Use |
|---|---|---|
| Pending | Scheduling issues | describe_pod, get_nodes, get_events |
| ImagePullBackOff | Registry/auth | describe_pod, check image name |
| CrashLoopBackOff | App crash | get_pod_logs(previous=True) |
| OOMKilled | Memory limit | get_pod_metrics, adjust limits |
| ContainerCreating | Volume/network | describe_pod, get_pvc |
1. get_nodes() - List nodes and status
2. describe_node(name) - See conditions and capacity
3. Check: Ready, MemoryPressure, DiskPressure, PIDPressure
4. node_logs_tool(name, "kubelet") - Kubelet logs
1. get_pod_logs(name, namespace, previous=True) - See why it crashed
2. describe_pod(name, namespace) - Check resource limits, probes
3. get_pod_metrics(name, namespace) - Memory/CPU at crash time
4. If OOM: compare requests/limits to actual usage
5. If app error: check logs for stack trace
1. get_services(namespace) - Verify service exists
2. get_endpoints(namespace) - Check endpoint backends
3. If empty endpoints: pods don't match selector
4. get_network_policies(namespace) - Check traffic rules
5. For Cilium: cilium_endpoints_list_tool(), hubble_flows_query_tool()
1. get_pvc(namespace) - Check PVC status
2. describe_pvc(name, namespace) - See binding issues
3. get_storage_classes() - Verify provisioner exists
4. If Pending: check storage class, access modes
1. kubectl_exec(pod, namespace, "nslookup kubernetes.default") - Test DNS
2. If fails: check coredns pods in kube-system
3. get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
4. get_pod_logs(name="coredns-*", namespace="kube-system")
All tools support context parameter for targeting different clusters:
get_pods(namespace="kube-system", context="production-cluster")
get_events(namespace="default", context="staging-cluster")
describe_pod(name="myapp-xyz", namespace="prod", context="prod-east")
For comprehensive diagnostics, run the bundled scripts:
See references/DECISION-TREE.md for visual troubleshooting flowcharts.
See references/COMMON-ERRORS.md for error message explanations and fixes.
get_pods, describe_pod, get_pod_logs, get_pod_metricsget_events, get_nodes, describe_nodeget_resource_usage, compare_namespacescilium_endpoints_list_tool, hubble_flows_query_toolistio_proxy_status_tool, istio_analyze_tool