Manage Kubernetes clusters using kubectl — pods, deployments, services, namespaces, nodes, secrets, configmaps, and RBAC. Use this skill whenever the user mentions Kubernetes, K8s, kubectl, pods, deployments, services, namespaces, nodes, clusters, container orchestration, pod logs, pod debugging, scaling deployments, cluster health, node status, drain/cordon, rollouts, configmaps, secrets, RBAC, service accounts, ingress, or any request involving managing workloads on a Kubernetes cluster. Also use when the user pastes kubectl output and asks for help interpreting it, wants to troubleshoot a CrashLoopBackOff or ImagePullBackOff, needs to write or review K8s YAML manifests, or asks about resource quotas and limit ranges. Even if they don't say 'Kubernetes' explicitly but describe container orchestration tasks like 'scale this up', 'check why the pod is failing', or 'deploy this image', use this skill.
Manage Kubernetes clusters directly from the terminal using kubectl. This skill covers the full lifecycle of cluster operations: pod management, deployments, services, namespace isolation, node diagnostics, secrets/configmaps, and RBAC.
Before running any commands, verify cluster access:
kubectl cluster-info
kubectl config current-context
If the user hasn't specified a context or namespace, ask which cluster and namespace they want to target. Default to the current context if they say "just use whatever I'm connected to."
| Task | Command |
|---|---|
| List all pods | kubectl get pods -n <ns> |
| Pod details | kubectl describe pod <name> -n <ns> |
| Pod logs | kubectl logs <pod> -n <ns> --tail=100 |
| Exec into pod | kubectl exec -it <pod> -n <ns> -- /bin/sh |
| List deployments | kubectl get deployments -n <ns> |
| Scale deployment | kubectl scale deployment <name> --replicas=<N> -n <ns> |
| Rollout status | kubectl rollout status deployment/<name> -n <ns> |
| Rollout undo | kubectl rollout undo deployment/<name> -n <ns> |
| List services | kubectl get svc -n <ns> |
| List nodes | kubectl get nodes -o wide |
| Node details | kubectl describe node <name> |
| Drain node | kubectl drain <node> --ignore-daemonsets --delete-emptydir-data |
| Cordon node | kubectl cordon <node> |
| List namespaces | kubectl get namespaces |
| Create namespace | kubectl create namespace <name> |
Show pods with useful context (status, restarts, age, node placement):
# All pods in a namespace with wide output
kubectl get pods -n <namespace> -o wide
# All pods across all namespaces
kubectl get pods --all-namespaces -o wide
# Filter by label
kubectl get pods -n <namespace> -l app=<label>
# Watch for changes in real-time
kubectl get pods -n <namespace> -w
When a user asks "why is my pod failing?" or "what's wrong with this pod?", follow this diagnostic sequence:
kubectl get pod <name> -n <ns> -o widekubectl describe pod <name> -n <ns> — look at the Events section at the bottomkubectl logs <name> -n <ns> --tail=200kubectl logs <name> -n <ns> --previouskubectl logs <name> -n <ns> -c <container>| Status | Likely Cause | Action |
|---|---|---|
CrashLoopBackOff | App crashes on startup | Check logs with --previous, look for config errors or missing env vars |
ImagePullBackOff | Wrong image name/tag or registry auth | Verify image exists, check imagePullSecrets |
Pending | No schedulable node | Check node resources with kubectl describe nodes, look for taints/tolerations |
OOMKilled | Memory limit exceeded | Increase resources.limits.memory in the pod spec |
CreateContainerConfigError | Missing configmap/secret | Verify all referenced configmaps and secrets exist in the namespace |
Init:Error | Init container failed | Check init container logs: kubectl logs <pod> -c <init-container> |
When the user wants to deploy a pod from a YAML manifest:
# Apply a manifest
kubectl apply -f <manifest.yaml>
# Create from image directly (quick testing only)
kubectl run <name> --image=<image> -n <ns>
# Dry-run to validate before applying
kubectl apply -f <manifest.yaml> --dry-run=client
When generating YAML manifests, always include:
metadata.labels for identificationresources.requests and resources.limits for CPU/memorysecurityContext.runAsNonRoot: true unless root is explicitly neededrestartPolicy appropriate to the workload# Delete a single pod
kubectl delete pod <name> -n <ns>
# Delete pods by label
kubectl delete pods -l app=<label> -n <ns>
# Force delete a stuck pod (use with caution)
kubectl delete pod <name> -n <ns> --grace-period=0 --force
Always warn the user before force-deleting. Explain that force-delete skips graceful shutdown and the workload may not clean up properly.
# Interactive shell
kubectl exec -it <pod> -n <ns> -- /bin/sh
# Run a specific command
kubectl exec <pod> -n <ns> -- <command>
# Ephemeral debug container (K8s 1.23+)
kubectl debug -it <pod> -n <ns> --image=busybox --target=<container>
# List deployments
kubectl get deployments -n <ns>
# Deployment details (includes replica sets and rollout history)
kubectl describe deployment <name> -n <ns>
# Scale replicas
kubectl scale deployment <name> --replicas=<N> -n <ns>
# Update image (triggers rolling update)
kubectl set image deployment/<name> <container>=<new-image> -n <ns>
# Check rollout status
kubectl rollout status deployment/<name> -n <ns>
# View rollout history
kubectl rollout history deployment/<name> -n <ns>
# Rollback to previous revision
kubectl rollout undo deployment/<name> -n <ns>
# Rollback to specific revision
kubectl rollout undo deployment/<name> --to-revision=<N> -n <ns>
# Restart all pods in a deployment (rolling restart)
kubectl rollout restart deployment/<name> -n <ns>
# List services
kubectl get svc -n <ns>
# Service details (endpoints, ports)
kubectl describe svc <name> -n <ns>
# Expose a deployment as a service
kubectl expose deployment <name> --port=<port> --target-port=<target> --type=ClusterIP -n <ns>
# Delete a service
kubectl delete svc <name> -n <ns>
When the user asks to "expose" or "make accessible" a deployment, ask what type of service they need:
# List ingress resources
kubectl get ingress -n <ns>
# Describe ingress
kubectl describe ingress <name> -n <ns>
# Cluster info
kubectl cluster-info
# Component status (etcd, scheduler, controller-manager)
kubectl get componentstatuses
# API server health
kubectl get --raw='/healthz'
# List nodes with status and resource info
kubectl get nodes -o wide
# Detailed node info (capacity, allocatable, conditions, taints)
kubectl describe node <name>
# Node resource usage (requires metrics-server)
kubectl top nodes
# Pod resource usage
kubectl top pods -n <ns>
When a user needs to take a node out of rotation for maintenance:
# 1. Cordon — mark node as unschedulable (no new pods)
kubectl cordon <node>
# 2. Drain — evict existing pods gracefully
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
# 3. Perform maintenance...
# 4. Uncordon — mark node as schedulable again
kubectl uncordon <node>
Warn the user that drain will evict all non-DaemonSet pods. Pods managed by a Deployment/ReplicaSet will be rescheduled on other nodes; standalone pods will be deleted.
| Condition | Meaning | Action |
|---|---|---|
Ready=False | Kubelet not reporting healthy | SSH into node, check kubelet logs: journalctl -u kubelet |
MemoryPressure | Node running low on memory | Check top consumers with kubectl top pods, consider eviction or scaling |
DiskPressure | Node disk space low | Clean up images: docker system prune / crictl rmi --prune |
NetworkUnavailable | CNI plugin issue | Check CNI pods (e.g., kubectl get pods -n kube-system -l k8s-app=calico-node) |
# List namespaces
kubectl get namespaces
# Create namespace
kubectl create namespace <name>
# Delete namespace (deletes ALL resources within it)
kubectl delete namespace <name>
# Set default namespace for current context
kubectl config set-context --current --namespace=<name>
Always confirm with the user before deleting a namespace — this is destructive and removes everything in it.
# View quotas in a namespace
kubectl get resourcequotas -n <ns>
# Describe quota details
kubectl describe resourcequota <name> -n <ns>
When creating a resource quota manifest:
apiVersion: v1