DevOps and Infrastructure-as-Code patterns — Terraform modules/state/workspaces, Kubernetes resources/Helm/troubleshooting, Docker multi-stage builds/compose, CI/CD pipelines (GitHub Actions, GitLab CI). Use when Terraform, Kubernetes, K8s, Docker, Dockerfile, Helm, CI/CD, pipeline, infrastructure, IaC, deploy, container, pod, service, ingress, GitHub Actions, GitLab CI.
Practical patterns for Terraform, Kubernetes, Docker, and CI/CD pipelines.
Pass the mode as the first argument:
| Mode | Usage | Purpose |
|---|---|---|
terraform | /devops terraform | Module structure, state, workspaces, provider patterns |
kubernetes | /devops kubernetes | Resources, Helm, troubleshooting, namespace strategies |
docker | /devops docker | Dockerfile best practices, multi-stage, compose, security |
cicd | /devops cicd | GitHub Actions, GitLab CI, caching, deployment strategies |
$ARGUMENTS
modules/
vpc/
main.tf # Resources
variables.tf # Input variables
outputs.tf # Outputs
versions.tf # Required providers + terraform block
README.md # Auto-generated by terraform-docs
environments/
dev/
main.tf # Module calls with dev values
backend.tf # S3 backend config
terraform.tfvars # Environment-specific values
prod/
...
| Pattern | Do | Don't |
|---|---|---|
| State | Remote backend (S3 + DynamoDB lock) | Local terraform.tfstate in repo |
| Secrets | aws_ssm_parameter / vault_generic_secret | Hardcoded in .tfvars |
| Modules | Pin versions: source = "git::...?ref=v1.2.0" | Unpinned ref=main |
| Workspaces | One workspace per env OR separate dirs per env | Mix both approaches |
| Providers | Lock in .terraform.lock.hcl, commit it | Ignore lock file in .gitignore |
| Data sources | Use to reference existing infra | Hardcode ARNs/IDs |
# Conditional resource count
resource "aws_ecs_service" "app" {
desired_count = var.env == "prod" ? 3 : 1
launch_type = "FARGATE"
}
# Prevent accidental destruction
lifecycle {
prevent_destroy = true
ignore_changes = [engine_version] # Managed externally
}
# Dynamic blocks for repeated nested blocks
dynamic "ingress" {
for_each = var.allowed_ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
}
}
| Anti-Pattern | Why It Breaks | Fix |
|---|---|---|
| Single state for everything | Blast radius too large, slow plans | Split by service/layer |
terraform import without code | State drift on next apply | Always write resource block first |
-target in CI | Partial applies cause drift | Full plan/apply only |
terraform taint | Deprecated, forces recreation | Use terraform apply -replace=ADDR |
Production Deployment checklist — every Deployment should have:
| Field | Why |
|---|---|
resources.requests (cpu + memory) | Scheduler needs this to place pods |
resources.limits.memory (skip CPU limit) | OOMKill protection; CPU limits cause throttling |
livenessProbe + readinessProbe | Restart crashed containers, stop routing to unready ones |
securityContext.runAsNonRoot: true | Principle of least privilege |
topologySpreadConstraints | Spread across AZs for HA |
Pinned image tag (not :latest) | Reproducible deployments |
secretKeyRef for secrets | Never hardcode in env |
| Symptom | Likely Cause | Debug Command |
|---|---|---|
CrashLoopBackOff | App crashes on startup, bad config, missing env var | kubectl logs <pod> --previous |
OOMKilled | Memory limit too low or memory leak | kubectl describe pod <pod> — check Last State |
Pending | No node matches requests/affinity/taints | kubectl describe pod <pod> — check Events |
ImagePullBackOff | Wrong image name, missing creds, private registry | kubectl get events --field-selector reason=Failed |
Evicted | Node disk pressure or memory pressure | kubectl get pods --field-selector status.phase=Failed |
CreateContainerConfigError | Missing ConfigMap/Secret referenced by pod | kubectl get events -n <ns> |
| Strategy | When | Example |
|---|---|---|
| Per-environment | Small teams, simple | dev, staging, prod |
| Per-team | Multi-team, need isolation | team-platform, team-payments |
| Per-service | Microservices, strict RBAC | svc-auth, svc-orders |
Always set: ResourceQuota, LimitRange, NetworkPolicy per namespace.
# Debug without deploying
helm template my-release ./chart -f values-prod.yaml | kubectl apply --dry-run=client -f -
# Diff before upgrade (requires helm-diff plugin)
helm diff upgrade my-release ./chart -f values-prod.yaml
# Rollback
helm rollback my-release 1
# Multi-stage: build + runtime
FROM node:22-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts # Deterministic, no post-install scripts
COPY . .
RUN npm run build
FROM gcr.io/distroless/nodejs22-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER nonroot
EXPOSE 8080
CMD ["dist/server.js"]
| Rule | Why |
|---|---|
COPY dependency files first, then RUN install | Cache deps layer when only code changes |
Group related RUN commands with && | Fewer layers, smaller image |
| Put least-changing layers first | Maximize cache hits |
Use .dockerignore aggressively | node_modules, .git, dist, *.md |
| Check | Implementation |
|---|---|
| Non-root user | USER nonroot (distroless) or RUN adduser --system app && USER app |
| Minimal base | distroless, alpine, or -slim variants |
| No secrets in image | Multi-stage, or --mount=type=secret in BuildKit |
| Pin base image digest | FROM node:22-slim@sha256:abc... for reproducibility |
| Scan images | trivy image myapp:latest or docker scout cves |
# docker-compose.yml — local dev