Purpose

Chaos Curator systematically validates and improves system resilience through controlled chaos experiments in distributed infrastructure. Real use cases:

Validate microservice circuit breakers and retry logic when upstream targets return 500s with 2s latency
Test PostgreSQL HA failover when primary node becomes unresponsive for 30s
Verify Kubernetes HPA scales from 3 to 15 pods within 2 minutes under CPU pressure
Confirm Redis Cluster survives 50% node loss while maintaining quorum
Ensure AWS RDS Multi-AZ failover completes within 30 seconds with <1% transaction loss
Test Istio mTLS resilience when intermediate CA expires
Validate backup restoration procedures under production-like load
Measure RTO/RPO for critical workflows during AZ-level outages

Scope

Chaos Curator provides these exact commands:

chaos-curator experiment generate <template> --target <resource> --duration <time> --interval <check-frequency> --metrics-threshold <thresholds> --blast-radius <percentage>
chaos-curator experiment inject <experiment-id> --namespace <ns> --wait-for-completion --notify-slack --rollback-on-failure
chaos-curator experiment status <experiment-id> --watch --output json
chaos-curator experiment timeline <experiment-id> --format timeline
chaos-curator experiment abort <experiment-id> --grace-period <seconds> --force
chaos-curator validate infrastructure <cluster-name> --check-pod-disruption-budgets --check-hpa --check-pdb --check-network-policies
chaos-curator validate metrics <service-name> --query "rate(http_requests_total[5m])" --alert-rules --slo-baseline <value>
chaos-curator resilience score <namespace> --services <svc1,svc2> --include-chaos-metrics --report-format html
chaos-curator blast-radius calculate <experiment-yaml> --dependencies-from-istio --max-impact-percentage <value>
chaos-curator rollback create <experiment-id> --type helm --release-name <release> --namespace <ns> --timeout <seconds>
chaos-curator rollback execute <rollback-id> --verify-post-conditions --dry-run
chaos-curator dashboard open --experiment <id> --grafana-dashboard-id <id> --telemetry-channel <slack-channel>
chaos-curator hypothesis verify <hypothesis-id> --compare-baseline --statistical-significance <p-value>

Purpose

Chaos Curator systematically validates and improves system resilience through controlled chaos experiments in distributed infrastructure. Real use cases:

Validate microservice circuit breakers and retry logic when upstream targets return 500s with 2s latency

Test PostgreSQL HA failover when primary node becomes unresponsive for 30s

Verify Kubernetes HPA scales from 3 to 15 pods within 2 minutes under CPU pressure

Confirm Redis Cluster survives 50% node loss while maintaining quorum

Ensure AWS RDS Multi-AZ failover completes within 30 seconds with <1% transaction loss

Test Istio mTLS resilience when intermediate CA expires

Validate backup restoration procedures under production-like load

Measure RTO/RPO for critical workflows during AZ-level outages

Scope

Chaos Curator provides these exact commands:

chaos-curator experiment generate <template> --target <resource> --duration <time> --interval <check-frequency> --metrics-threshold <thresholds> --blast-radius <percentage> chaos-curator experiment inject <experiment-id> --namespace <ns> --wait-for-completion --notify-slack --rollback-on-failure chaos-curator experiment status <experiment-id> --watch --output json chaos-curator experiment timeline <experiment-id> --format timeline chaos-curator experiment abort <experiment-id> --grace-period <seconds> --force chaos-curator validate infrastructure <cluster-name> --check-pod-disruption-budgets --check-hpa --check-pdb --check-network-policies chaos-curator validate metrics <service-name> --query "rate(http_requests_total[5m])" --alert-rules --slo-baseline <value> chaos-curator resilience score <namespace> --services <svc1,svc2> --include-chaos-metrics --report-format html chaos-curator blast-radius calculate <experiment-yaml> --dependencies-from-istio --max-impact-percentage <value> chaos-curator rollback create <experiment-id> --type helm --release-name <release> --namespace <ns> --timeout <seconds> chaos-curator rollback execute <rollback-id> --verify-post-conditions --dry-run chaos-curator dashboard open --experiment <id> --grafana-dashboard-id <id> --telemetry-channel <slack-channel> chaos-curator hypothesis verify <hypothesis-id> --compare-baseline --statistical-significance <p-value>

Chaos Curator

Purpose

Scope

Chaos Curator

Purpose

Scope

Detailed Work Process

Sessions

Docker Patterns

Autonomous Loops

Kotlin Patterns

Eval Harness

Golang Patterns