accuracy while preserving O(1) memory per key, making it the only algorithm that scales linearly with unique API keys without sacrificing burst tolerance.
- Float-based token accounting prevents precision loss at high refill rates (e.g., 10 req/s β 0.1s cost per request), eliminating false denials that plague integer counters.
- Redis Lua execution guarantees atomic read-modify-write, reducing distributed race conditions to near-zero under concurrent load balancer routing.
Core Solution
The token bucket model assigns each key a bucket with four properties: capacity (C), refill rate (R), current tokens, and lastRefill timestamp. On every request, the system computes elapsed time, refills tokens capped at C, and either allows or denies based on cost. Denied requests receive a precise retryAfter delay calculated as (cost - tokens) / R.
In-Memory Implementation (Single-Instance Node)
For single-process services, an in-memory Map<string, BucketState> eliminates network hops and serialization overhead. Node's single-threaded event loop guarantees atomicity within a tick, making this approach safe for non-clustered deployments.
type BucketState = { tokens: number; lastRefill: number };
export type TokenBucketResult = {
allowed: boolean;
remaining: number;
retryAfterMs: number;
};
export class TokenBucket {
private buckets = new Map<string, BucketState>();
constructor(
private readonly capacity: number,
private readonly refillPerSecond: number,
) {}
consume(key: string, cost = 1, now = Date.now()): TokenBucketResult {
const state = this.buckets.get(key) ?? {
tokens: this.capacity,
lastRefill: now,
};
const elapsedSec = (now - state.lastRefill) / 1000;
const refilled = Math.min(
this.capacity,
state.tokens + elapsedSec * this.refillPerSecond,
);
if (refilled >= cost) {
const next: BucketState = {
tokens: refilled - cost,
lastRefill: now,
};
this.buckets.set(key, next);
return { allowed: true, remaining: next.tokens, retryAfterMs: 0 };
}
const deficit = cost - refilled;
const retryAfterMs = Math.ceil(
(deficit / this.refillPerSecond) * 1000,
);
this.buckets.set(key, { tokens: refilled, lastRefill: now });
return { allowed: false, remaining: refilled, retryAfterMs };
}
// Optional: prune idle buckets to bound memory.
prune(now = Date.now(), idleMs = 60 * 60 * 1000): number {
let removed = 0;
for (const [k, s] of this.buckets) {
if (now - s.lastRefill > idleMs) {
this.buckets.delete(k);
removed++;
}
}
return removed;
}
}
Operational pruning is mandatory to prevent unbounded memory growth. Schedule it independently of the idleMs threshold:
const limiter = new TokenBucket(100, 10); // 100 capacity, 10/s refill
setInterval(() => limiter.prune(), 10 * 60 * 1000).unref();
unref() ensures the cleanup interval does not block graceful process shutdowns. The prune cadence and idleMs are decoupled: the interval merely bounds how long tombstones persist before garbage collection.
Distributed Implementation (Redis + Lua)
When scaling behind a load balancer, in-memory state becomes incorrect. Clients can route across instances to multiply effective limits. Redis provides a single-threaded event loop that executes Lua scripts atomically, eliminating the classic GET β compute β SET race condition where concurrent processes read identical token counts and both decrement, causing token leakage.
The Lua script below performs the entire read-modify-write cycle in one indivisible step:
-- token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill = tonumber(ARGV[2]) -- tokens per second
local now_ms = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local data = redis.call("HMGET", key,
Note: The script continues with atomic token calculation, state update, and return of allowed/denied status with retry delay. The Lua execution guarantees O(1) distributed atomicity without external locks or Redis transactions.
Architecture Decisions:
- Use in-memory
Map for single-instance services to avoid Redis latency and cost.
- Switch to Redis + Lua when deploying PM2 cluster mode, worker threads, or multi-node load-balanced architectures.
- Always return
retryAfter in milliseconds derived from (deficit / refillRate) * 1000 to prevent thundering herd retries.
- Keep tokens as
number (float) to maintain sub-second refill precision.
Pitfall Guide
- Window Boundary Bursts: Fixed-wall-clock counters reset at arbitrary timestamps, allowing clients to bypass limits by splitting traffic across window edges. Always prefer time-relative algorithms like token bucket.
- Ignoring Distributed Atomicity: Naive
GET/SET patterns in Redis create race conditions under concurrent load. Two processes reading tokens=5 both write tokens=4, leaking capacity. Use Lua or Redis transactions for atomic state mutations.
- Unbounded Memory Growth: In-memory implementations without pruning accumulate state for every unique key indefinitely. Schedule periodic
prune() calls to reclaim memory for idle keys.
- Integer Token Rounding: Using integers for token counts causes precision loss at high refill rates (e.g., 10 req/s = 0.1s cost). Floats prevent false denials and ensure smooth refill math.
- Cluster/Worker Mode Blind Spots: Assuming single-instance atomicity in PM2 cluster or Node worker threads breaks rate limiting. Each thread/process maintains isolated state, multiplying effective limits. Route to Redis or use a shared-nothing consistent hashing strategy.
- Static or Incorrect Retry-After Headers: Returning fixed delays or omitting
retryAfter causes clients to retry simultaneously, recreating the burst. Calculate dynamically: (cost - currentTokens) / refillRate.
- Forgetting
unref() on Cleanup Timers: Leaving setInterval referenced prevents Node from exiting during graceful shutdowns or deployments. Always call .unref() on background maintenance intervals.
Deliverables
- Blueprint: Token Bucket Rate Limiter Deployment Blueprint β Covers architecture selection (single-node vs distributed), Redis/Lua atomicity setup, Node.js integration patterns, monitoring hooks (Prometheus/Grafana metrics for allowed/denied ratios), and capacity planning formulas.
- Checklist: Pre-flight & Operational Checklist β Verifies float-based token math, Lua script atomicity validation,
prune() scheduling with unref(), correct retryAfter header formatting, load balancer sticky-session awareness, and cluster mode compatibility verification.
- Configuration Template: Rate Limit Policy YAML/JSON β Standardized structure defining
capacity, refillRate, keyStrategy (API key, IP, tenant), costPerEndpoint, retryAfterPrecision, and pruneIdleMs. Includes environment-specific overrides for staging vs production burst tolerance.