Kubernetes Troubleshooter & Incident Response

Systematic approach to diagnosing and resolving Kubernetes issues in production environments.

When to Use This Skill

Use this skill when:

Investigating pod failures (CrashLoopBackOff, ImagePullBackOff, Pending, etc.)
Responding to production incidents or outages
Troubleshooting cluster health issues
Diagnosing networking or service connectivity problems
Investigating storage/volume issues
Analyzing performance degradation
Conducting post-incident analysis

Core Troubleshooting Workflow

Follow this systematic approach for any Kubernetes issue:

1. Gather Context

What is the observed symptom?
When did it start?
What changed recently (deployments, config, infrastructure)?

Kubernetes Troubleshooter & Incident Response

Systematic approach to diagnosing and resolving Kubernetes issues in production environments.

When to Use This Skill

Use this skill when:

Investigating pod failures (CrashLoopBackOff, ImagePullBackOff, Pending, etc.)
Responding to production incidents or outages
Troubleshooting cluster health issues
Diagnosing networking or service connectivity problems
Investigating storage/volume issues
Analyzing performance degradation
Conducting post-incident analysis

Core Troubleshooting Workflow

Follow this systematic approach for any Kubernetes issue:

1. Gather Context

What is the observed symptom?
When did it start?
What changed recently (deployments, config, infrastructure)?

K8s Troubleshooter

Kubernetes Troubleshooter & Incident Response

When to Use This Skill

Core Troubleshooting Workflow

1. Gather Context

K8s Troubleshooter

Kubernetes Troubleshooter & Incident Response

When to Use This Skill

Core Troubleshooting Workflow

1. Gather Context

2. Initial Triage

3. Deep Dive Investigation

4. Identify Root Cause

5. Apply Remediation

6. Verify & Monitor

Incident Response

Quick Reference Commands

Cluster Overview

Pod Diagnostics

Node Diagnostics

Service & Network

Storage

Resource & Configuration

Diagnostic Scripts

cluster_health.py

check_namespace.py

diagnose_pod.py

Reference Documentation

references/common_issues.md

references/incident_response.md

references/performance_troubleshooting.md

references/helm_troubleshooting.md

Best Practices

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns