Kubernetes Troubleshooting Guide

Overview

This skill provides systematic approaches to diagnosing and resolving common Kubernetes issues including pod failures, networking problems, and resource constraints.

Diagnostic Workflow

Step 1: Identify the Problem

# Check pod status across all namespaces
kubectl get pods -A | grep -v Running

# View recent events sorted by time
kubectl get events --sort-by='.lastTimestamp' | tail -20

# Check node health
kubectl get nodes
kubectl top nodes

Step 2: Gather Details

# Describe problematic pod
kubectl describe pod <pod-name> -n <namespace>

# Check pod logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous  # Previous container

# Check resource usage
kubectl top pod <pod-name> -n <namespace>

Kubernetes Troubleshooting Guide

Overview

This skill provides systematic approaches to diagnosing and resolving common Kubernetes issues including pod failures, networking problems, and resource constraints.

Diagnostic Workflow

Step 1: Identify the Problem

# Check pod status across all namespaces
kubectl get pods -A | grep -v Running

# View recent events sorted by time
kubectl get events --sort-by='.lastTimestamp' | tail -20

# Check node health
kubectl get nodes
kubectl top nodes

Step 2: Gather Details

# Describe problematic pod
kubectl describe pod <pod-name> -n <namespace>

# Check pod logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous  # Previous container

# Check resource usage
kubectl top pod <pod-name> -n <namespace>

Cause	Indicator	Fix
OOMKilled	Exit code 137	Increase memory limits
Missing config	ConfigMap/Secret errors	Verify CM/Secret exists
Failed probes	Liveness probe failed	Adjust probe thresholds
App crash	Application error in logs	Fix application code
Bad command	Error starting container	Verify command/args

Cause	Check	Fix
Insufficient CPU/Memory	`kubectl describe node`	Add nodes or reduce requests
PVC not bound	`kubectl get pvc`	Check storage class
Node selector miss	Pod spec nodeSelector	Update selector or label nodes
Taint not tolerated	Node taints	Add toleration to pod

K8s Troubleshooter

Kubernetes Troubleshooting Guide

Overview

Diagnostic Workflow

Step 1: Identify the Problem

Step 2: Gather Details

K8s Troubleshooter

Kubernetes Troubleshooting Guide

Overview

Diagnostic Workflow

Step 1: Identify the Problem

Step 2: Gather Details

Pod Issue Resolution

CrashLoopBackOff

ImagePullBackOff

Pending State

Service Issues

No Endpoints

DNS Resolution Failures

Resource Issues

OOMKilled

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow