ep 1: Decouple Build-Time Configuration from Runtime Execution
Next.js evaluates next.config.ts during the Docker build phase. If you conditionally resolve handler paths using environment variables, the wrong path gets baked into the standalone bundle. The fix is a request-time router module that defers environment evaluation until the server actually starts handling traffic.
// src/cache/cache-router.mjs
import { createRedisComponentHandler } from './redis-component-handler.mjs';
import { createFallbackMemoryHandler } from './memory-fallback-handler.mjs';
const ENABLE_DISTRIBUTED_CACHE = process.env.ENABLE_DISTRIBUTED_CACHE === '1';
export default ENABLE_DISTRIBUTED_CACHE
? createRedisComponentHandler({
redisUrl: process.env.REDIS_CLUSTER_URL,
buildId: process.env.NEXT_BUILD_SHA,
abortTimeoutMs: 1200,
})
: createFallbackMemoryHandler();
This router exports a single default handler. Next.js resolves the path once during build, but the internal logic evaluates environment variables at runtime, guaranteeing the correct backend activates on each replica.
Step 2: Implement the Plural Cache Handler Interface
The cacheHandlers interface expects specific lifecycle methods. We'll structure the Redis adapter to handle serialization, tag management, and SWR boundaries explicitly.
// src/cache/redis-component-handler.mjs
import { createClient } from 'redis';
import { serialize, deserialize } from '../utils/serializer.mjs';
export function createRedisComponentHandler(config) {
const client = createClient({ url: config.redisUrl });
const namespace = `nc:${config.buildId}:comp`;
async function get(key) {
const raw = await client.get(`${namespace}:${key}`);
if (!raw) return undefined;
return deserialize(raw);
}
async function set(key, data, options) {
const serialized = serialize(data);
const ttl = options?.revalidate ? options.revalidate * 1000 : undefined;
await client.set(`${namespace}:${key}`, serialized, ttl ? { EX: ttl } : {});
}
async function delete(key) {
await client.del(`${namespace}:${key}`);
}
async function revalidateTag(tags) {
const tagKeys = tags.map(t => `${namespace}:tag:${t}`);
await client.del(tagKeys);
}
return { get, set, delete, revalidateTag };
}
Architecture Rationale:
- Namespace Injection: The
buildId prefix ensures that cache entries from previous deployments are automatically orphaned. No manual purge required.
- Explicit TTL Mapping: Next.js passes
revalidate in seconds. We convert to Redis EX (seconds) to align with SWR semantics.
- Tag Deletion Strategy: Tags are stored as separate keys. Deleting them invalidates the association without scanning the entire keyspace.
Step 3: Enforce Atomic Tag Updates with Lua
Standard Redis MULTI/EXEC blocks introduce race conditions when multiple replicas attempt tag updates simultaneously. Lua scripts execute atomically within Redis, eliminating TOCTOU (time-of-check to time-of-use) bugs.
-- scripts/atomic-tag-update.lua
local namespace = KEYS[1]
local tag = ARGV[1]
local entryKey = ARGV[2]
local ttl = tonumber(ARGV[3])
local tagKey = namespace .. ':tag:' .. tag
local entryKeyFull = namespace .. ':entry:' .. entryKey
redis.call('SADD', tagKey, entryKeyFull)
if ttl > 0 then
redis.call('EXPIRE', tagKey, ttl)
end
return 1
The handler loads this script once during initialization and executes it via EVALSHA. This guarantees that tag-to-entry mappings are updated without partial writes, even under high concurrency.
Step 4: Implement Single-Flight SWR Coordination
When multiple replicas hit the SWR boundary simultaneously, they all trigger background revalidation. This creates a thundering herd against your origin. A leader-follower pattern solves this using a distributed lock.
// src/cache/single-flight-lock.mjs
import { createClient } from 'redis';
export class SwrLock {
constructor(client, namespace) {
this.client = client;
this.namespace = namespace;
this.lockTtl = 8000; // 8 seconds
}
async acquire(key) {
const lockKey = `${this.namespace}:swr-lock:${key}`;
const acquired = await this.client.set(lockKey, '1', { NX: true, EX: this.lockTtl });
return acquired === 'OK';
}
async release(key) {
const lockKey = `${this.namespace}:swr-lock:${key}`;
await this.client.del(lockKey);
}
}
When a replica detects a stale entry, it attempts to acquire the lock. If successful, it becomes the leader and fetches fresh data. Followers serve the stale response while the leader refreshes. This reduces origin load by N-1 where N is the replica count.
Pitfall Guide
1. Build-Time Environment Evaluation
Explanation: Using process.env inside next.config.ts to conditionally require.resolve() a handler path causes the build environment's variables to dictate the runtime behavior. If the env var isn't set during Docker build, the wrong handler gets baked into the standalone bundle.
Fix: Always route through a request-time module. Let next.config.ts point to a static router file that evaluates environment variables when the server process starts.
2. Standalone Bundle Omission
Explanation: Next.js's output: 'standalone' mode uses static analysis to trace dependencies. If your cache handler or Lua scripts aren't explicitly imported in traced code paths, they get excluded from .next/standalone/, causing runtime MODULE_NOT_FOUND errors.
Fix: Use outputFileTracingIncludes in next.config.ts to force inclusion of cache adapters, router modules, and script directories.
3. SWR Stampede Overload
Explanation: Without coordination, every replica independently detects staleness and triggers a background fetch. This multiplies origin load by the number of replicas, potentially causing cascading failures.
Fix: Implement a distributed lock (Redis SETNX or similar) at the SWR boundary. Only the lock holder refreshes; others serve stale data until the lock expires or the refresh completes.
4. Tag Key Collision Across Deploys
Explanation: If cache keys lack a deployment identifier, a new release will read stale tags from the previous version. This causes incorrect invalidation or prevents fresh data from propagating.
Fix: Prefix all cache keys with a build SHA or deployment timestamp. Auto-inject this value from CI/CD pipelines rather than hardcoding it.
5. Silent Timeout Degradation
Explanation: Redis operations can hang due to network partitions or cluster failovers. Without explicit timeouts, cache reads block request threads, increasing p99 latency and triggering gateway timeouts.
Fix: Wrap all Redis calls with AbortController or library-specific timeout options. Fail open to origin fetches if the cache layer exceeds the threshold.
6. Inconsistent Serialization
Explanation: Next.js expects cache handlers to return plain objects or undefined. Storing raw Buffer or custom class instances causes hydration mismatches or runtime type errors.
Fix: Use a deterministic serializer (e.g., JSON.stringify with replacers, or msgpack) that strips non-serializable metadata. Validate payload shape before storage and after retrieval.
7. Missing Observability Hooks
Explanation: Distributed caches operate as black boxes. Without metrics, you cannot detect lock contention, tag invalidation failures, or timeout spikes until users report degraded performance.
Fix: Instrument the handler with OpenTelemetry spans or custom metric callbacks. Track cache.hit, cache.miss, swr.leader, swr.follower, and redis.latency to establish baseline behavior.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single replica / dev environment | In-memory LRU | Zero infrastructure overhead, fastest latency | $0 |
| Multi-replica staging | Standard Redis adapter | Shared state, basic tag support | Low (managed Redis) |
| Production multi-replica with SWR | Lua-atomic distributed handler | Prevents stampedes, guarantees consistency, auto-isolates deploys | Medium (Redis + monitoring) |
| Strict compliance / air-gapped | Local cache with periodic sync | Avoids external dependencies, meets data residency | High (engineering overhead) |
Configuration Template
// next.config.ts
import type { NextConfig } from 'next';
const nextConfig: NextConfig = {
cacheComponents: true,
cacheHandlers: {
default: require.resolve('./src/cache/cache-router.mjs'),
},
outputFileTracingIncludes: {
'/**/*': [
'./src/cache/**/*',
'./scripts/**/*',
'./node_modules/redis/**/*',
],
},
experimental: {
// Enable if using App Router component caching
cacheLife: {
default: { revalidate: 3600, tags: ['default'] },
},
},
};
export default nextConfig;
Quick Start Guide
- Install dependencies: Add
redis and your preferred serializer to your project. Ensure Next.js 16 is installed.
- Create the router module: Build a request-time handler router that evaluates environment variables and exports the appropriate cache adapter.
- Wire the configuration: Point
cacheHandlers.default in next.config.ts to the router. Add outputFileTracingIncludes for cache and script directories.
- Deploy with build identifiers: Pass
NEXT_BUILD_SHA or equivalent during CI/CD. Verify Redis connectivity and timeout thresholds.
- Validate with traffic: Run load tests or route production traffic through the new handler. Monitor OpenTelemetry metrics for hit rates, lock acquisition, and latency. Confirm tag invalidation propagates across replicas.