Kubernetes debugging patterns. Use for pod crashes, CrashLoopBackOff, OOMKilled, ImagePullBackOff, scheduling failures, deployment issues.
When debugging Kubernetes issues, ALWAYS check events first:
get_pod_events - Shows scheduling, pulling, starting, probes, OOMget_pod_logs - Application-level errorsEvents explain most crash/scheduling issues faster than logs.
1. list_pods → Get overview of pod health in namespace
2. get_pod_events → Understand WHY pods are in their state
3. get_pod_logs → Only if events don't explain the issue
4. get_pod_resources → For performance/resource issues
5. describe_deployment → Check deployment status and conditions
First check: get_pod_events
| Event Reason | Likely Cause | Next Step |
|---|---|---|
| OOMKilled | Memory limit too low or memory leak | Check get_pod_resources, increase limits |
| Error | Application crash | Check get_pod_logs for stack trace |
| BackOff | Repeated failures | Check logs for startup errors |
Checklist:
get_deployment_history)First check: get_pod_events (confirms OOMKilled)
Then: get_pod_resources (compare usage to limits)
Common causes:
First check: get_pod_events
Common causes:
First check: get_pod_events
Look for:
FailedScheduling - Insufficient resourcesUnschedulable - Node affinity/taintsFirst check: describe_pod (shows probe config)
Then: get_pod_events (probe failure events)
Then: get_pod_logs (why endpoint isn't responding)
First check: get_pod_events
Causes:
describe_deployment → Check replicas (desired vs ready vs available)
get_deployment_history → Compare current vs previous revision
get_pod_events → For pods in new ReplicaSet
Common causes:
Use get_deployment_history to see previous working versions.
For memory/CPU issues:
1. get_pod_resources → See allocation vs usage
2. describe_pod → See full container spec
3. get_cloudwatch_metrics/query_datadog_metrics → Historical usage
4. detect_anomalies on historical data → Find when issue started