스킬 파일

K8s Debug

Name: K8s Debug
Author: incidentfox

Kubernetes debugging patterns. Use for pod crashes, CrashLoopBackOff, OOMKilled, ImagePullBackOff, scheduling failures, deployment issues.

incidentfox559 스타2026. 1. 23.

직업
카테고리: 컨테이너

스킬 내용

Kubernetes Debugging Expertise

Golden Rule: Events Before Logs

When debugging Kubernetes issues, ALWAYS check events first:

get_pod_events - Shows scheduling, pulling, starting, probes, OOM
THEN get_pod_logs - Application-level errors

Events explain most crash/scheduling issues faster than logs.

Typical Investigation Flow

1. list_pods        → Get overview of pod health in namespace
2. get_pod_events   → Understand WHY pods are in their state
3. get_pod_logs     → Only if events don't explain the issue
4. get_pod_resources → For performance/resource issues
5. describe_deployment → Check deployment status and conditions

Common Issue Patterns

관련 스킬

K8s Debug | Skills Pool

Event Reason	Likely Cause	Next Step
OOMKilled	Memory limit too low or memory leak	Check `get_pod_resources`, increase limits
Error	Application crash	Check `get_pod_logs` for stack trace
BackOff	Repeated failures	Check logs for startup errors

describe_deployment  → Check replicas (desired vs ready vs available)
get_deployment_history → Compare current vs previous revision
get_pod_events → For pods in new ReplicaSet

1. get_pod_resources → See allocation vs usage
2. describe_pod → See full container spec
3. get_cloudwatch_metrics/query_datadog_metrics → Historical usage
4. detect_anomalies on historical data → Find when issue started

K8s Debug

Kubernetes Debugging Expertise

Golden Rule: Events Before Logs

Typical Investigation Flow

Common Issue Patterns

K8s Debug

Kubernetes Debugging Expertise

Golden Rule: Events Before Logs

Typical Investigation Flow

Common Issue Patterns

CrashLoopBackOff

OOMKilled

ImagePullBackOff

Pending Pods

Readiness/Liveness Probe Failures

Evicted Pods

Deployment Issues

Stuck Rollout

Rollback Decision

Error Classification

Non-Retryable (Stop Immediately)

Retryable (May retry once)

Resource Investigation Pattern

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow