技能档案

K8s Cluster Manager

Name: K8s Cluster Manager
Author: ranvirkardhanraj

Manage Kubernetes clusters using kubectl — pods, deployments, services, namespaces, nodes, secrets, configmaps, and RBAC. Use this skill whenever the user mentions Kubernetes, K8s, kubectl, pods, deployments, services, namespaces, nodes, clusters, container orchestration, pod logs, pod debugging, scaling deployments, cluster health, node status, drain/cordon, rollouts, configmaps, secrets, RBAC, service accounts, ingress, or any request involving managing workloads on a Kubernetes cluster. Also use when the user pastes kubectl output and asks for help interpreting it, wants to troubleshoot a CrashLoopBackOff or ImagePullBackOff, needs to write or review K8s YAML manifests, or asks about resource quotas and limit ranges. Even if they don't say 'Kubernetes' explicitly but describe container orchestration tasks like 'scale this up', 'check why the pod is failing', or 'deploy this image', use this skill.

ranvirkardhanraj0 星标2026年4月11日

职业
分类: 容器

技能内容

Manage Kubernetes clusters directly from the terminal using kubectl. This skill covers the full lifecycle of cluster operations: pod management, deployments, services, namespace isolation, node diagnostics, secrets/configmaps, and RBAC.

Prerequisites

Before running any commands, verify cluster access:

kubectl cluster-info
kubectl config current-context

If the user hasn't specified a context or namespace, ask which cluster and namespace they want to target. Default to the current context if they say "just use whatever I'm connected to."

Quick Reference

Task	Command
List all pods	`kubectl get pods -n <ns>`
Pod details	`kubectl describe pod <name> -n <ns>`
Pod logs	`kubectl logs <pod> -n <ns> --tail=100`

K8s Cluster Manager

ranvirkardhanraj0 星标2026年4月11日

职业
分类: 容器

技能内容

Prerequisites

Before running any commands, verify cluster access:

kubectl cluster-info
kubectl config current-context

If the user hasn't specified a context or namespace, ask which cluster and namespace they want to target. Default to the current context if they say "just use whatever I'm connected to."

Quick Reference

Task	Command
List all pods	`kubectl get pods -n <ns>`
Pod details	`kubectl describe pod <name> -n <ns>`
Pod logs	`kubectl logs <pod> -n <ns> --tail=100`

相关技能

# All pods in a namespace with wide output
kubectl get pods -n <namespace> -o wide

# All pods across all namespaces
kubectl get pods --all-namespaces -o wide

# Filter by label
kubectl get pods -n <namespace> -l app=<label>

# Watch for changes in real-time
kubectl get pods -n <namespace> -w

Status	Likely Cause	Action
`CrashLoopBackOff`	App crashes on startup	Check logs with `--previous`, look for config errors or missing env vars
`ImagePullBackOff`	Wrong image name/tag or registry auth	Verify image exists, check imagePullSecrets
`Pending`	No schedulable node	Check node resources with `kubectl describe nodes`, look for taints/tolerations
`OOMKilled`	Memory limit exceeded	Increase `resources.limits.memory` in the pod spec
`CreateContainerConfigError`	Missing configmap/secret	Verify all referenced configmaps and secrets exist in the namespace
`Init:Error`	Init container failed	Check init container logs: `kubectl logs <pod> -c <init-container>`

# Apply a manifest
kubectl apply -f <manifest.yaml>

# Create from image directly (quick testing only)
kubectl run <name> --image=<image> -n <ns>

# Dry-run to validate before applying
kubectl apply -f <manifest.yaml> --dry-run=client

# Delete a single pod
kubectl delete pod <name> -n <ns>

# Delete pods by label
kubectl delete pods -l app=<label> -n <ns>

# Force delete a stuck pod (use with caution)
kubectl delete pod <name> -n <ns> --grace-period=0 --force

# Interactive shell
kubectl exec -it <pod> -n <ns> -- /bin/sh

# Run a specific command
kubectl exec <pod> -n <ns> -- <command>

# Ephemeral debug container (K8s 1.23+)
kubectl debug -it <pod> -n <ns> --image=busybox --target=<container>

# List deployments
kubectl get deployments -n <ns>

# Deployment details (includes replica sets and rollout history)
kubectl describe deployment <name> -n <ns>

# Scale replicas
kubectl scale deployment <name> --replicas=<N> -n <ns>

# Update image (triggers rolling update)
kubectl set image deployment/<name> <container>=<new-image> -n <ns>

# Check rollout status
kubectl rollout status deployment/<name> -n <ns>

# View rollout history
kubectl rollout history deployment/<name> -n <ns>

# Rollback to previous revision
kubectl rollout undo deployment/<name> -n <ns>

# Rollback to specific revision
kubectl rollout undo deployment/<name> --to-revision=<N> -n <ns>

# Restart all pods in a deployment (rolling restart)
kubectl rollout restart deployment/<name> -n <ns>

# List services
kubectl get svc -n <ns>

# Service details (endpoints, ports)
kubectl describe svc <name> -n <ns>

# Expose a deployment as a service
kubectl expose deployment <name> --port=<port> --target-port=<target> --type=ClusterIP -n <ns>

# Delete a service
kubectl delete svc <name> -n <ns>

# List ingress resources
kubectl get ingress -n <ns>

# Describe ingress
kubectl describe ingress <name> -n <ns>

# Cluster info
kubectl cluster-info

# Component status (etcd, scheduler, controller-manager)
kubectl get componentstatuses

# API server health
kubectl get --raw='/healthz'

# List nodes with status and resource info
kubectl get nodes -o wide

# Detailed node info (capacity, allocatable, conditions, taints)
kubectl describe node <name>

# Node resource usage (requires metrics-server)
kubectl top nodes

# Pod resource usage
kubectl top pods -n <ns>

# 1. Cordon — mark node as unschedulable (no new pods)
kubectl cordon <node>

# 2. Drain — evict existing pods gracefully
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data

# 3. Perform maintenance...

# 4. Uncordon — mark node as schedulable again
kubectl uncordon <node>

Condition	Meaning	Action
`Ready=False`	Kubelet not reporting healthy	SSH into node, check kubelet logs: `journalctl -u kubelet`
`MemoryPressure`	Node running low on memory	Check top consumers with `kubectl top pods`, consider eviction or scaling
`DiskPressure`	Node disk space low	Clean up images: `docker system prune` / `crictl rmi --prune`
`NetworkUnavailable`	CNI plugin issue	Check CNI pods (e.g., `kubectl get pods -n kube-system -l k8s-app=calico-node`)

# List namespaces
kubectl get namespaces

# Create namespace
kubectl create namespace <name>

# Delete namespace (deletes ALL resources within it)
kubectl delete namespace <name>

# Set default namespace for current context
kubectl config set-context --current --namespace=<name>

# View quotas in a namespace
kubectl get resourcequotas -n <ns>

# Describe quota details
kubectl describe resourcequota <name> -n <ns>

apiVersion: v1

K8s Cluster Manager

Prerequisites

Quick Reference

K8s Cluster Manager

Prerequisites

Quick Reference

1. Pod Lifecycle Management

Listing Pods

Inspecting a Pod

Common Pod Failure Patterns

Creating Pods

Deleting Pods

Exec and Debugging

2. Deployments & Services

Deployments

Services

Ingress

3. Cluster Health & Node Diagnostics

Cluster Overview

Node Management

Node Maintenance

Troubleshooting Node Issues

4. Namespaces & Resource Isolation

Namespace Management

Resource Quotas

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow