private_key: { filename: "/etc/envoy/certs/server.key" }
tls_params:
tls_minimum_protocol_version: TLSv1_2
cipher_suites: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
tls_session_ticket_keys:
- filename: "/etc/envoy/tickets/key.bin"
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
http2_protocol_options:
max_concurrent_streams: 100
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match: { prefix: "/api/" }
route:
cluster: backend_cluster
timeout: 10s
retry_policy:
retry_on: "5xx,reset,connect-failure"
num_retries: 2
per_try_timeout: 3s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 1000
tokens_per_fill: 500
fill_interval: 1s
- name: envoy.filters.http.circuit_breakers
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.circuit_breakers.v3.CircuitBreakers
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 512
max_requests: 2048
max_retries: 3
clusters:
- name: backend_cluster
connect_timeout: 2s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
health_checks:
- timeout: 3s
interval: 5s
unhealthy_threshold: 2
healthy_threshold: 2
http_health_check:
path: "/healthz"
expected_statuses:
range:
start: 200
end: 299
load_assignment:
cluster_name: backend_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: backend-svc.default.svc.cluster.local
port_value: 8080
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 2048
max_pending_requests: 1024
max_requests: 4096
max_retries: 2
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: backend.internal
### Backend Health Check Endpoint (Go)
Load balancers require accurate backend state. A naive `/healthz` returning `200 OK` is insufficient. Implement a composite health check that verifies dependencies, connection pools, and queue depth.
```go
package main
import (
"encoding/json"
"net/http"
"sync/atomic"
"time"
)
var (
ready int32
connections int64
queueDepth int64
)
func healthHandler(w http.ResponseWriter, r *http.Request) {
status := map[string]interface{}{
"status": "healthy",
"timestamp": time.Now().UTC().Format(time.RFC3339),
"connections": atomic.LoadInt64(&connections),
"queue_depth": atomic.LoadInt64(&queueDepth),
"ready": atomic.LoadInt32(&ready) == 1,
}
// Fail if queue depth exceeds threshold or not ready
if atomic.LoadInt64(&queueDepth) > 500 || atomic.LoadInt32(&ready) != 1 {
status["status"] = "degraded"
w.WriteHeader(http.StatusServiceUnavailable)
} else {
w.WriteHeader(http.StatusOK)
}
json.NewEncoder(w).Encode(status)
}
func main() {
http.HandleFunc("/healthz", healthHandler)
http.ListenAndServe(":8080", nil)
}
Integration Notes
- Metrics Export: Expose Prometheus metrics (
http_requests_total, request_duration_seconds, backend_queue_depth) to feed Envoy's dynamic weighting or external autoscalers.
- Connection Pooling: Envoy's
max_connections and max_requests thresholds prevent backend saturation. Tune based on your runtime's goroutine/thread limits.
- TLS Session Resumption: Pre-generate session ticket keys and rotate them weekly to balance security and handshake overhead.
Pitfall Guide (6 Critical Mistakes)
1. Ignoring Connection Exhaustion & Keep-Alive Misconfiguration
Symptom: Backend threads pile up, file descriptor limits hit, and latency spikes despite low request rates.
Root Cause: Load balancers keep connections open indefinitely, or backends close them prematurely, causing TCP handshake storms.
Mitigation: Align keepalive_timeout (LB) with keep-alive (backend). Set max_requests_per_connection to force periodic renegotiation. Monitor TIME_WAIT sockets and tune tcp_tw_reuse/tcp_fin_timeout at the OS level.
2. Health Check Misconfiguration & Thundering Herds
Symptom: All healthy backends receive a traffic surge simultaneously after a failed node recovers.
Root Cause: Synchronous active health checks or zero jitter in recovery timing.
Mitigation: Add jitter to health check intervals (interval: 5s ± 1s). Use passive failure detection alongside active checks. Implement gradual traffic injection (e.g., max_ejection_percent: 20 initially, scaling up).
3. Over-Reliance on Round-Robin Without Backend Awareness
Symptom: CPU-heavy instances saturate while I/O-bound nodes sit idle. P99 latency degrades unevenly.
Root Cause: Round-robin assumes homogeneous backends, which is false in microservices and autoscaled environments.
Mitigation: Switch to LEAST_REQUEST or RING_HASH with weighted endpoints. Inject real-time metrics (CPU, memory, queue depth) via gRPC or Prometheus to dynamically adjust weights.
4. TLS Termination Bottlenecks
Symptom: Load balancer CPU hits 90%+ during traffic spikes, increasing handshake latency.
Root Cause: Full TLS handshakes for every connection, missing session resumption, or weak cipher suites.
Mitigation: Enable TLS session tickets and OCSP stapling. Prefer ECDHE ciphers. Offload TLS to dedicated hardware or use kernel TLS (kTLS) where supported. Monitor ssl_handshakes vs ssl_session_reuses.
5. Lack of Observability & Blind Routing
Symptom: Traffic shifts to degraded nodes without alerting; latency spikes go undetected until user complaints.
Root Cause: Load balancers operate on static configs without real-time backend telemetry.
Mitigation: Integrate distributed tracing (OpenTelemetry) and expose LB metrics to Prometheus. Implement latency-based routing and alert on P95/P99 breaches. Use canary deployments with traffic mirroring for safe rollouts.
6. Static Scaling in Bursty Environments
Symptom: Backends scale too late, causing 5xx errors during traffic spikes; over-provisioning during lulls.
Root Cause: Autoscaling reacts to CPU/memory thresholds rather than queue depth or request latency.
Mitigation: Use predictive autoscaling (KEDA, Prometheus Adapter) that scales on http_requests_in_flight or custom queue metrics. Configure LB to drain terminating pods gracefully (drain_timeout: 30s).
Production Bundle
✅ Deployment Checklist
Pre-Flight
Runtime
Post-Deployment
📊 Decision Matrix
| Criteria | L4 Load Balancer (IPVS/iptables) | L7 Load Balancer (Envoy/Nginx) | Cloud Provider LB (ALB/NLB) |
|---|
| Protocol Support | TCP/UDP only | HTTP/1.1, HTTP/2, gRPC, WebSocket | HTTP/HTTPS, TCP, UDP |
| Intelligent Routing | ❌ | ✅ (path, header, weight, canary) | ✅ (path, host, weight) |
| TLS Termination | ❌ (requires sidecar) | ✅ (native, session resumption) | ✅ (managed, ACM integration) |
| Observability | Limited (conntrack stats) | Rich (metrics, tracing, logs) | Moderate (CloudWatch, basic logs) |
| Cost & Ops | Low infra, high ops | Medium infra, medium ops | Low ops, pay-per-use |
| Best For | High-throughput TCP/UDP, gaming, IoT | Microservices, API gateways, complex routing | Quick deployment, managed TLS, standard web apps |
📄 Config Template (Envoy Cluster + Rate Limit + Circuit Breaker)
# envoy-production.yaml
static_resources:
listeners:
- name: https_listener
address: { socket_address: { address: 0.0.0.0, port_value: 443 } }
filter_chains:
- transport_socket: { name: envoy.transport_sockets.tls, typed_config: { "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.ServerTlsContext } }
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: prod_ingress
route_config:
name: prod_route
virtual_hosts:
- name: api
domains: ["api.example.com"]
routes:
- match: { prefix: "/v1/" }
route:
cluster: prod_backend
timeout: 8s
retry_policy: { retry_on: "5xx,reset", num_retries: 2, per_try_timeout: 2s }
http_filters:
- name: envoy.filters.http.local_ratelimit
typed_config: { "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit, stat_prefix: rl, token_bucket: { max_tokens: 2000, tokens_per_fill: 1000, fill_interval: 1s } }
- name: envoy.filters.http.router
clusters:
- name: prod_backend
connect_timeout: 1.5s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
outlier_detection: { consecutive_5xx: 4, interval: 8s, base_ejection_time: 20s, max_ejection_percent: 40 }
health_checks: [{ timeout: 2s, interval: 4s, unhealthy_threshold: 2, healthy_threshold: 2, http_health_check: { path: "/healthz" } }]
load_assignment:
cluster_name: prod_backend
endpoints: [{ lb_endpoints: [{ endpoint: { address: { socket_address: { address: "backend-prod.default.svc.cluster.local", port_value: 8080 } } } }] }]
circuit_breakers: { thresholds: [{ priority: DEFAULT, max_connections: 1500, max_pending_requests: 750, max_requests: 3000, max_retries: 2 }] }
🚀 Quick Start Guide (5 Steps)
-
Containerize & Expose Health Endpoints
- Ensure backend services expose
/healthz with readiness, liveness, and dependency checks.
- Add Prometheus metrics exporter to track request latency, error rates, and queue depth.
-
Deploy Envoy as Sidecar or Edge Proxy
kubectl apply -f envoy-configmap.yaml
kubectl apply -f envoy-deployment.yaml
kubectl apply -f envoy-service.yaml
Mount TLS certs and session ticket keys via Kubernetes secrets.
-
Configure Dynamic Routing & Resilience
- Set
lb_policy: LEAST_REQUEST with weighted endpoints.
- Enable
outlier_detection and circuit_breakers to prevent cascade failures.
- Apply
local_ratelimit to protect backend from burst traffic.
-
Validate Under Load
- Run synthetic traffic:
wrk -t12 -c400 -d60s https://api.example.com/v1/test
- Monitor P99 latency, error rates, and connection counts in Grafana.
- Simulate backend failure:
kubectl delete pod <backend-pod> and verify traffic rerouting.
-
Automate & Observe
- Integrate with Prometheus Adapter for HPA: scale on
http_requests_in_flight.
- Configure alerts on
envoy_cluster_upstream_cx_destroy_remote, ssl_handshake_errors, and P95 > SLA.
- Implement GitOps for LB config changes; validate with
envoy --mode validate -c envoy.yaml.
Final Thoughts
Load balancing for high-traffic backends is no longer a set-and-forget infrastructure component. It is a dynamic, metrics-driven control plane that must adapt to backend health, network conditions, and traffic patterns in real time. By combining intelligent algorithms, rigorous health management, TLS optimization, and deep observability, you transform the load balancer from a simple router into a resilience engine. Use the configurations, pitfalls, and production bundle provided here to deploy systems that absorb spikes, isolate failures, and maintain sub-100ms P99 latency under sustained load.