ne as the backend services to minimize cross-AZ traffic. Use a Global Server Load Balancer (GSLB) for DNS-based routing and a regional load balancer for local distribution.
- Control Plane: Decouple configuration from data plane. Use a push-based model with delta updates to minimize sync overhead. Implement configuration versioning and rollback capabilities.
Step 2: Connection Pooling and Keep-Alive
Connection exhaustion is the primary cause of gateway failure. Configure aggressive keep-alive settings and dynamic connection limits.
# Envoy Cluster Configuration for Scale
clusters:
- name: backend_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 1024
max_requests: 1024
max_retries: 3
connection_pool_per_downstream_connection: false
http2_protocol_options:
max_concurrent_streams: 100
upstream_connection_options:
tcp_keepalive:
keepalive_time: 60
keepalive_intvl: 10
keepalive_probes: 3
- Rationale:
LEAST_REQUEST load balancing prevents hot spots. max_connections is tuned based on backend capacity; setting this too low causes queuing, while too high risks overwhelming backends. HTTP/2 multiplexing reduces connection overhead significantly.
Step 3: Distributed Rate Limiting
Local rate limiting fails in distributed environments due to uneven traffic distribution. Implement a distributed rate limiter using a token bucket algorithm with a shared state store (e.g., Redis).
// Distributed Rate Limiter Implementation
// TypeScript / Node.js context for custom gateway logic or sidecar agent
import { Redis } from 'ioredis';
interface RateLimitConfig {
requestsPerSecond: number;
burstSize: number;
keyPrefix: string;
}
export class DistributedRateLimiter {
private redis: Redis;
constructor(redisUrl: string) {
this.redis = new Redis(redisUrl);
}
async isAllowed(clientId: string, config: RateLimitConfig): Promise<boolean> {
const key = `${config.keyPrefix}:${clientId}`;
const now = Date.now();
const windowMs = 1000;
// Lua script for atomic token bucket operation
const luaScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local burst = tonumber(ARGV[3])
local window = tonumber(ARGV[4])
local last_refill = tonumber(redis.call('hget', key, 'last_refill')) or 0
local tokens = tonumber(redis.call('hget', key, 'tokens')) or burst
local elapsed = now - last_refill
local new_tokens = math.min(burst, tokens + (elapsed / window) * limit)
if new_tokens >= 1 then
redis.call('hset', key, 'tokens', new_tokens - 1)
redis.call('hset', key, 'last_refill', now)
return 1
else
redis.call('hset', key, 'tokens', new_tokens)
redis.call('hset', key, 'last_refill', now)
return 0
end
`;
try {
const result = await this.redis.eval(
luaScript,
1,
key,
now,
config.requestsPerSecond,
config.burstSize,
windowMs
);
return result === 1;
} catch (error) {
// Fail-open or fail-closed based on policy
console.error('Rate limiter error:', error);
return true; // Fail-open for availability
}
}
}
- Rationale: The Lua script ensures atomicity. Storing state in Redis allows multiple gateway instances to share rate limit counters. The
eval command minimizes network round trips. Fail-open logic ensures that rate limiter unavailability does not block all traffic, though fail-closed may be required for security-sensitive contexts.
Step 4: Observability and Egress
At scale, logging every request to disk or a central collector destroys performance. Implement sampling and asynchronous egress.
- Metrics: Export histograms for latency, request counts, and error rates. Use OpenTelemetry for distributed tracing.
- Logs: Sample access logs at 1% for standard traffic and 100% for error responses. Flush logs asynchronously to avoid blocking the request path.
- Health Checks: Implement active health checking with interval jitter to prevent sync storms. Configure outlier detection to eject unhealthy hosts automatically.
Pitfall Guide
-
Blocking I/O in Plugins:
- Mistake: Performing synchronous HTTP calls or database queries within gateway plugins.
- Impact: Worker threads block, causing cascading latency spikes.
- Fix: Use async I/O patterns or offload heavy logic to sidecar services. Envoy's external authorization service should be non-blocking.
-
Ignoring Cross-AZ Costs:
- Mistake: Routing traffic across availability zones unnecessarily.
- Impact: Increased latency and significant cloud egress costs.
- Fix: Configure locality-weighted load balancing. Route to the nearest healthy endpoint.
-
TLS Session Resumption Failure:
- Mistake: Not sharing TLS session caches across gateway instances.
- Impact: Every request requires a full TLS handshake, increasing CPU load by 30-50%.
- Fix: Enable TLS session tickets or share a Redis-backed session cache.
-
Config Sync Thundering Herd:
- Mistake: Broadcasting full configuration snapshots to all gateways simultaneously.
- Impact: Control plane overload and gateway restarts.
- Fix: Use delta updates. Implement exponential backoff and jitter in configuration clients.
-
Memory Leaks in Dynamic Loading:
- Mistake: Dynamically loading plugins or filters without proper lifecycle management.
- Impact: Gradual memory exhaustion and OOM kills.
- Fix: Validate memory usage during load testing. Use WASM for isolated plugin execution to prevent leaks from affecting the core process.
-
Single Point of Failure in Control Plane:
- Mistake: Relying on a single control plane instance for configuration.
- Impact: Gateway fleet becomes stale or unconfigurable during control plane outage.
- Fix: Deploy control plane with high availability. Gateways should cache configuration locally and operate independently if the control plane is unreachable.
-
Improper Timeout Configuration:
- Mistake: Setting gateway timeouts lower than backend processing times.
- Impact: Premature 504 errors, causing retries that overwhelm backends.
- Fix: Align gateway timeouts with backend SLAs. Implement retry budgets to prevent retry storms.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Throughput, Low Latency | Envoy Distributed Edge | Maximizes throughput via non-blocking I/O and locality routing. | High infrastructure cost, low latency cost. |
| Legacy Monolith Migration | NGINX Reverse Proxy | Simpler configuration, proven stability for basic proxying. | Low infrastructure cost, higher latency risk. |
| Multi-Cloud Strategy | Cloud ALB + WAF | Leverages managed services for global routing and security. | High managed service cost, low ops overhead. |
| Cost-Sensitive Startup | Open Source (Kong/K3s) | Self-hosted with community support; scales with K8s. | Low license cost, high engineering overhead. |
| Regulatory Compliance | On-Prem Gateway Mesh | Full control over data path and encryption keys. | High hardware cost, high compliance assurance. |
Configuration Template
Below is a production-ready Envoy configuration snippet focusing on scale optimizations.
static_resources:
listeners:
- name: main_listener
address:
socket_address:
address: 0.0.0.0
port_value: 443
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_params:
tls_minimum_protocol_version: TLSv1_3
tls_session_ticket_keys:
keys:
- filename: /etc/envoy/tls_ticket_key
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: backend_service
timeout: 5s
retry_policy:
retry_on: "5xx"
num_retries: 2
per_try_timeout: 2s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/stdout
log_format:
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(:PATH)%"
response_code: "%RESPONSE_CODE%"
duration: "%DURATION%"
clusters:
- name: backend_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 1024
max_requests: 1024
max_retries: 3
http2_protocol_options:
max_concurrent_streams: 100
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
Quick Start Guide
- Deploy Envoy: Run Envoy using Docker or Kubernetes. Mount the configuration file and TLS certificates.
docker run -d -p 443:443 -v $(pwd)/envoy.yaml:/etc/envoy/envoy.yaml -v $(pwd)/tls:/etc/envoy/tls envoyproxy/envoy:v1.28-latest
- Apply Base Config: Use the configuration template above. Ensure TLS tickets and certificates are valid. Verify the listener starts without errors.
- Validate Health: Curl the gateway endpoint to verify routing and TLS termination.
curl -k https://localhost/health
- Load Test: Use a tool like
wrk or k6 to simulate traffic. Monitor metrics for latency, error rates, and connection counts.
wrk -t12 -c400 -d30s https://localhost
- Tune Parameters: Adjust
max_connections, max_requests, and timeouts based on load test results and backend capacity. Iterate until p99 latency meets SLA requirements.