K8s Cluster Upgrade Runbook

Use when performing k8s cluster upgrade runbook — kubernetes cluster upgrade runbook covering pre-upgrade checks, API deprecation review, node upgrade procedure, validation tests, and rollback steps. Supports EKS, GKE, AKS, and self-managed clusters. Use for minor/major version upgrades.

Beruf
Kategorien: Container

Kubernetes Cluster Upgrade Runbook Skill

Upgrade cluster {{ cluster_name }} from {{ current_version }} to {{ target_version }} ({{ distribution | "self-managed" }}).

Workflow

Step 1 — Pre-Upgrade Assessment

PRE-UPGRADE CHECKS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CLUSTER HEALTH
[ ] All nodes in Ready state
[ ] No pods in CrashLoopBackOff or Pending state
[ ] Cluster autoscaler healthy (if enabled)
[ ] etcd cluster healthy (self-managed only)
[ ] Control plane components healthy
[ ] PodDisruptionBudgets reviewed (won't block node drain)

API COMPATIBILITY
[ ] Reviewed deprecated APIs between {{ current_version }} and {{ target_version }}
[ ] No workloads using removed APIs
[ ] Ran: kubectl deprecations (or pluto scan)
[ ] Helm charts compatible with {{ target_version }}
[ ] CRDs compatible with target version
[ ] Admission webhooks compatible

ADD-ONS & COMPONENTS
[ ] CoreDNS version compatible
[ ] kube-proxy version compatible
[ ] CNI plugin compatible (Calico, Cilium, VPC-CNI)
[ ] CSI drivers compatible
[ ] Ingress controller compatible
[ ] Cert-manager compatible
[ ] Monitoring stack compatible (Prometheus, etc.)

Kubernetes Cluster Upgrade Runbook Skill

Upgrade cluster {{ cluster_name }} from {{ current_version }} to {{ target_version }} ({{ distribution | "self-managed" }}).

Workflow

Step 1 — Pre-Upgrade Assessment

PRE-UPGRADE CHECKS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CLUSTER HEALTH
[ ] All nodes in Ready state
[ ] No pods in CrashLoopBackOff or Pending state
[ ] Cluster autoscaler healthy (if enabled)
[ ] etcd cluster healthy (self-managed only)
[ ] Control plane components healthy
[ ] PodDisruptionBudgets reviewed (won't block node drain)

API COMPATIBILITY
[ ] Reviewed deprecated APIs between {{ current_version }} and {{ target_version }}
[ ] No workloads using removed APIs
[ ] Ran: kubectl deprecations (or pluto scan)
[ ] Helm charts compatible with {{ target_version }}
[ ] CRDs compatible with target version
[ ] Admission webhooks compatible

ADD-ONS & COMPONENTS
[ ] CoreDNS version compatible
[ ] kube-proxy version compatible
[ ] CNI plugin compatible (Calico, Cilium, VPC-CNI)
[ ] CSI drivers compatible
[ ] Ingress controller compatible
[ ] Cert-manager compatible
[ ] Monitoring stack compatible (Prometheus, etc.)

Shortcut	Counter	Why
"We can skip some steps for this case"	Adapt the workflow steps, don't skip them	Skipped steps are where incidents and oversights originate
"The user seems to already know what to do"	Complete all workflow phases with the user	The workflow catches blind spots that experience alone misses
"This is a minor case, full process is overkill"	Scale the process down, don't turn it off	Minor cases become major when unstructured; the process scales, not disappears
"I'll fill in the details later"	Complete each section before moving on	Deferred details are forgotten; real-time capture is more accurate
"The template output isn't necessary"	Always produce the structured output format	Structured output enables comparison, audit trails, and handoff to other teams

K8s Cluster Upgrade Runbook

Kubernetes Cluster Upgrade Runbook Skill

Workflow

Step 1 — Pre-Upgrade Assessment

K8s Cluster Upgrade Runbook

Kubernetes Cluster Upgrade Runbook Skill

Workflow

Step 1 — Pre-Upgrade Assessment

Step 2 — Backup & Snapshot

Step 3 — Control Plane Upgrade

Step 4 — Worker Node Upgrade

Step 5 — Add-On Upgrades

Step 6 — Post-Upgrade Validation

Step 7 — Rollback Procedure

Counter-Rationalizations

Output Format

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow