Guide for load balancing architecture and configuration. Use when: configuring load balancers, Nginx upstream blocks, HAProxy frontends/backends, AWS ALB/NLB/CLB, GCP/Azure LBs, round robin, weighted round robin, least connections, IP hash, consistent hashing, health checks, session affinity, sticky sessions, reverse proxy setup, SSL termination, connection draining, rate limiting at LB layer, WebSocket/gRPC balancing, GSLB, or auto-scaling integration. Do NOT use for: single-server deployments without distribution needs, CDN caching configuration, API gateway routing that does not involve load balancing, DNS-only failover without LB, service mesh sidecar traffic management (Istio/Linkerd), or firewall/WAF-only setups.
Use Layer 4 (transport) load balancing when operating on TCP/UDP connections without inspecting application payloads. L4 is faster, lower latency, and protocol-agnostic. Use it for raw TCP services, database connections, and high-throughput streaming.
Use Layer 7 (application) load balancing when routing decisions depend on HTTP headers, paths, cookies, or host names. L7 enables content-based routing, header injection, URL rewrites, and WAF integration. Use it for HTTP/HTTPS APIs, microservice routing, and WebSocket upgrades.
| Feature | L4 | L7 |
|---|---|---|
| Protocols | TCP, UDP | HTTP, HTTPS, gRPC, WS |
| Routing granularity | IP + port | Path, header, cookie, host |
| TLS handling | Passthrough or terminate | Terminate + inspect |
| Performance | Higher throughput | More CPU per request |
| Use case | DB, gaming, IoT | APIs, web apps, microsvcs |
Distribute requests sequentially across backends. Use for homogeneous server pools with uniform request cost. Simple, stateless, zero overhead.
Assign weights proportional to server capacity. Server with weight=3 receives 3x traffic
vs weight=1. Use when backend servers have different CPU/memory specs.
Route to the backend with fewest active connections. Requires real-time connection tracking. Use when request durations vary significantly (file uploads, long-polling).
Hash the client source IP to select a backend. Ensures the same client always hits the same server. Use for basic session persistence without cookies. Breaks when clients share IPs (NAT).
Map keys to a hash ring so adding/removing servers displaces minimal sessions. Use for caching layers, stateful services, and scenarios where backend pool changes frequently.
Pick two random backends, route to the one with fewer connections. Provides near-optimal distribution with minimal state. Use as a modern default for large backend pools.
upstream app_backend {
least_conn;
server 10.0.1.10:8080 weight=3 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 weight=2 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 backup;
keepalive 32;
}
server {
listen 80;
location / {
proxy_pass http://app_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_next_upstream error timeout http_502 http_503;
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
}
}
upstream sticky_backend {
ip_hash;
server 10.0.1.10:8080;
server 10.0.1.11:8080;
}
location /ws {
proxy_pass http://app_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s;
}
upstream grpc_backend {
server 10.0.1.10:50051;
server 10.0.1.11:50051;
}
server {
listen 443 ssl http2;
location / {
grpc_pass grpc://grpc_backend;
grpc_set_header X-Real-IP $remote_addr;
}
}
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://app_backend;
}
frontend http_front
bind *:80
bind *:443 ssl crt /etc/ssl/certs/site.pem
acl is_api path_beg /api
acl is_static path_beg /static /assets
acl is_ws hdr(Upgrade) -i websocket
http-request set-header X-Forwarded-Proto https if { ssl_fc }
use_backend api_servers if is_api
use_backend static_servers if is_static
use_backend ws_servers if is_ws
default_backend web_servers
backend api_servers
balance leastconn
option httpchk GET /healthz
http-check expect status 200
default-server inter 5s fall 3 rise 2 on-marked-down shutdown-sessions
server api1 10.0.1.10:8080 check weight 3
server api2 10.0.1.11:8080 check weight 2
server api3 10.0.1.12:8080 check backup
frontend http_front
stick-table type ip size 100k expire 10m store http_req_rate(10s)
tcp-request connection track-sc0 src
acl rate_abuse sc_http_req_rate(0) gt 50
http-request deny deny_status 429 if rate_abuse
backend web_servers
balance roundrobin
cookie SERVERID insert indirect nocache
server web1 10.0.1.10:8080 check cookie w1
server web2 10.0.1.11:8080 check cookie w2
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats auth admin:securepass
The LB periodically probes backends. Configure interval, timeout, and thresholds.
# Typical parameters