erification, deadlock prevention, and renewal. The following implementation uses Redis with Lua scripting to ensure atomicity, supporting auto-renewal via a watchdog pattern.
Architecture Decisions
- Atomicity via Lua: All lock operations must be atomic. Separate
GET and SET commands introduce race conditions. Lua scripts execute atomically within Redis, guaranteeing check-and-set semantics.
- Unique Ownership: Lock values must contain a unique owner identifier (UUID). This prevents a client from releasing a lock acquired by another client due to TTL expiration and re-acquisition.
- TTL and Deadlock Prevention: Every lock must have a Time-To-Live. If a client crashes, the lock expires automatically. Hardcoded infinite locks cause permanent deadlocks.
- Watchdog Renewal: For operations with unpredictable duration, a background process must renew the TTL before expiration. This prevents the "thundering herd" where multiple clients race to acquire a lock that expired prematurely.
Implementation (TypeScript)
This implementation provides a robust DistributedLock class with acquisition, release, and renewal capabilities.
import Redis from 'ioredis';
export interface LockConfig {
ttl: number; // Lock validity in ms
retryInterval: number; // Wait time between retries in ms
maxRetries: number; // Max attempts to acquire
watchdogInterval: number; // Interval to renew TTL in ms
}
export interface LockResult {
success: boolean;
lockKey: string;
ownerId: string;
}
export class DistributedLock {
private readonly redis: Redis;
private readonly config: Required<LockConfig>;
private readonly ownerId: string;
// Lua Script: Atomic acquisition
// KEYS[1]: lock_key
// ARGV[1]: owner_id
// ARGV[2]: ttl_ms
// Returns 1 if acquired, 0 if exists
private static readonly ACQUIRE_SCRIPT = `
if redis.call('SET', KEYS[1], ARGV[1], 'PX', ARGV[2], 'NX') then
return 1
else
return 0
end
`;
// Lua Script: Atomic release with ownership check
// Returns 1 if released, 0 if owner mismatch
private static readonly RELEASE_SCRIPT = `
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end
`;
// Lua Script: Atomic renewal with ownership check
// Returns 1 if renewed, 0 if owner mismatch
private static readonly RENEW_SCRIPT = `
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('PEXPIRE', KEYS[1], ARGV[2])
else
return 0
end
`;
constructor(redis: Redis, config: LockConfig) {
this.redis = redis;
this.config = {
ttl: config.ttl || 5000,
retryInterval: config.retryInterval || 100,
maxRetries: config.maxRetries || 3,
watchdogInterval: config.watchdogInterval || 1000,
};
this.ownerId = `${process.pid}-${crypto.randomUUID()}`;
}
async acquire(key: string): Promise<LockResult | null> {
const lockKey = `lock:${key}`;
for (let attempt = 0; attempt < this.config.maxRetries; attempt++) {
const acquired = await this.redis.eval(
DistributedLock.ACQUIRE_SCRIPT,
1,
lockKey,
this.ownerId,
this.config.ttl
);
if (acquired === 1) {
return { success: true, lockKey, ownerId: this.ownerId };
}
// Exponential backoff for retries
const delay = this.config.retryInterval * Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, delay));
}
return null;
}
async release(lockResult: LockResult): Promise<boolean> {
const result = await this.redis.eval(
DistributedLock.RELEASE_SCRIPT,
1,
lockResult.lockKey,
lockResult.ownerId
);
return result === 1;
}
async renew(lockResult: LockResult): Promise<boolean> {
const result = await this.redis.eval(
DistributedLock.RENEW_SCRIPT,
1,
lockResult.lockKey,
lockResult.ownerId,
this.config.ttl
);
return result === 1;
}
/**
* Starts a watchdog timer to automatically renew the lock.
* Returns a stop function to cancel the watchdog.
*/
startWatchdog(lockResult: LockResult): () => void {
const interval = setInterval(async () => {
const renewed = await this.renew(lockResult);
if (!renewed) {
clearInterval(interval);
console.error(`Watchdog failed to renew lock ${lockResult.lockKey}. Lock may be lost.`);
}
}, this.config.watchdogInterval);
return () => clearInterval(interval);
}
}
Usage Pattern
const lock = new DistributedLock(redisClient, { ttl: 10000, retryInterval: 200 });
const result = await lock.acquire('inventory:item-123');
if (!result) {
throw new Error('Failed to acquire lock');
}
// Start watchdog for long-running operations
const stopWatchdog = lock.startWatchdog(result);
try {
// Critical Section
await processInventoryUpdate();
} finally {
stopWatchdog();
await lock.release(result);
}
Pitfall Guide
1. Non-Atomic Check-and-Set
Mistake: Implementing locks using separate GET and SET commands.
Explanation: Between checking if a key exists and setting it, another client can acquire the lock. This breaks mutual exclusion.
Fix: Always use Lua scripts or Redis SET NX PX command. The operation must be indivisible.
2. Ignoring Clock Skew
Mistake: Assuming system clocks are perfectly synchronized across nodes.
Explanation: NTP adjustments can jump time forward or backward. If a node's clock jumps forward, it may expire a lock prematurely, allowing another node to acquire it while the first node is still active.
Fix: Use monotonically increasing clocks where possible, or implement Redlock quorum to mitigate single-node clock skew. For critical safety, prefer CP systems like ZooKeeper that use logical clocks.
3. Missing Watchdog Renewal
Mistake: Setting a fixed TTL without renewal for operations of variable duration.
Explanation: If the critical section takes longer than the TTL, the lock expires. A second client acquires the lock, leading to concurrent execution. The original client may still write data, causing corruption.
Fix: Implement a watchdog that renews the TTL at intervals shorter than the TTL. Ensure the watchdog stops immediately upon completion or failure.
4. Releasing Locks Owned by Others
Mistake: Releasing a lock without verifying ownership.
Explanation: If a lock expires and is re-acquired by another client, the original client might call DEL thinking it is releasing its own lock. This releases the new client's lock, breaking safety.
Fix: Store a unique owner ID in the lock value. The release script must verify the value matches the owner ID before deleting.
5. Thundering Herd on Release
Mistake: Having many clients waiting with identical retry intervals.
Explanation: When a lock is released, all waiting clients may attempt acquisition simultaneously, causing a spike in load and potential thundering herd latency.
Fix: Implement randomized jitter in retry intervals. Add exponential backoff to spread out acquisition attempts.
6. Lock Contention as Performance Bottleneck
Mistake: Using distributed locks for read-heavy workloads or large granularities.
Explanation: Locks serialize access. High contention drastically reduces throughput. Locking entire tables or large key ranges creates bottlenecks.
Fix: Lock at the finest granularity possible. Use read-write locks if the system supports them. Consider lock-free algorithms or CRDTs for mergeable state.
7. Deadlocks from Missing TTL
Mistake: Allowing locks to persist indefinitely on client crash.
Explanation: If a client crashes without releasing the lock, and no TTL is set, the lock remains forever. No other client can proceed.
Fix: Always set a TTL. Even for short-lived operations, a TTL is the safety net against client failures.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Financial Transaction | Postgres Advisory Lock or ZooKeeper | Requires strict CP safety; duplicate processing is unacceptable. | Medium (DB load or ZK infra). |
| Cache Invalidation | Redis SETNX | Idempotent operation; duplicate invalidation is safe. Low latency critical. | Low (Redis overhead). |
| Leader Election | ZooKeeper / Etcd | Requires strong consistency and reliable failure detection. | Medium (CP infra). |
| Rate Limiting | Redis Sliding Window | Locks are too coarse; atomic counters or sliding windows are more efficient. | Low. |
| Batch Job Coordination | Redis Redlock | Balances safety and performance for distributed task scheduling. | Low-Medium. |
| Session State Management | Database Row Lock | Tied to transactional boundaries; ensures consistency with user data. | Low (DB bound). |
Configuration Template
// config/lock-config.ts
import { LockConfig } from './DistributedLock';
export const lockConfigs: Record<string, LockConfig> = {
// High safety, moderate latency tolerance
financial: {
ttl: 5000,
retryInterval: 200,
maxRetries: 5,
watchdogInterval: 1500,
},
// Low latency, best-effort safety
cache: {
ttl: 2000,
retryInterval: 50,
maxRetries: 2,
watchdogInterval: 800,
},
// Long-running batch processing
batch: {
ttl: 30000,
retryInterval: 500,
maxRetries: 10,
watchdogInterval: 5000,
},
};
Quick Start Guide
-
Install Dependencies:
npm install ioredis
-
Copy Implementation:
Save the DistributedLock class and Lua scripts into your project structure (e.g., src/infrastructure/lock/DistributedLock.ts).
-
Initialize Lock Instance:
import Redis from 'ioredis';
import { DistributedLock } from './DistributedLock';
import { lockConfigs } from './config/lock-config';
const redis = new Redis(process.env.REDIS_URL);
const lock = new DistributedLock(redis, lockConfigs.financial);
-
Wrap Critical Section:
const result = await lock.acquire('order:12345');
if (!result) return { status: 409, message: 'Resource busy' };
const stopWatchdog = lock.startWatchdog(result);
try {
await processOrder(result);
} finally {
stopWatchdog();
await lock.release(result);
}
-
Verify:
Use redis-cli to inspect lock keys:
redis-cli> GET lock:order:12345
# Output should show the owner UUID
This guide provides the foundational pattern for safe distributed locking. Adapt the configuration and primitive selection based on your specific consistency and availability requirements. Always validate lock behavior under failure conditions before deploying to production.