Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem
Architecting Monotonic ID Generation: From Redis Races to Raft-Based Sequencers
Current Situation Analysis
High-throughput event systems frequently face a deceptively simple requirement: generate unique, monotonically increasing identifiers for non-fungible assets. Whether these are voucher codes, transaction receipts, or reward tokens, the business logic often demands strict linearizability. A duplicate ID is not merely a bug; it can represent financial loss, data corruption, or a violation of user trust.
The industry pain point lies in the mismatch between performance expectations and consistency models. Engineering teams routinely default to distributed caches like Redis Cluster for ID generation due to their low latency and familiarity. However, Redis Cluster is an AP (Availability/Partition Tolerance) system. It provides eventual consistency within shard slots and cannot guarantee atomic cross-slot operations. When clusters rebalance or experience network partitions, the assumption of a single atomic register breaks down.
This misunderstanding leads to severe production incidents. In a documented case involving a global event engine, the reliance on Redis Cluster resulted in race conditions during slot rebalancing. The system emitted duplicate keys at a rate of 0.04%, necessitating the manual backfilling of 14,000 items in the account database. The root cause was not the database itself, but the architectural error of treating an eventually consistent store as a linearizable counter under partial failure. Optimistic locking strategies without fencing tokens allowed concurrent clients to believe they held exclusive access, compounding the duplication rate.
WOW Moment: Key Findings
The transition from AP caches to CP (Consistency/Partition Tolerance) sequencers reveals a critical trade-off landscape. While CP systems introduce slightly higher latency, they eliminate the catastrophic cost of duplicate generation and manual recovery. The following data compares four architectural approaches evaluated during the resolution of high-velocity ID generation failures.
| Approach | p99 Latency | Duplication Risk | Operational Complexity | Cost per 1M IDs |
|---|---|---|---|---|
| Redis Cluster | ~2.0 ms | High (Races on rebalance) | Low | $0.00005 |
| Redlock (AP) | ~5.0 ms | Medium (Clock drift failures) | Medium | $0.00008 |
| FoundationDB | ~300 ms+ | Near Zero | High | $0.00050 |
| Raft Sequencer | 2.9 ms | Near Zero | Medium | $0.00012 |
Why this matters: The Raft-based sequencer (implemented via etcd) achieves the "sweet spot" for ID generation. It delivers latency comparable to Redis (2.9 ms vs 2.0 ms) while providing strict consistency guarantees that prevent duplicates. FoundationDB, while consistent, introduced unacceptable latency spikes (300 ms+) and retry storms under load, making it unsuitable for sub-5 ms SLAs. The data confirms that for ID generation, a lightweight CP consensus layer is superior to both AP caches and heavy-weight distributed SQL engines.
Core Solution
The robust solution is a dedicated Sequencer Service backed by a Raft consensus cluster. This architecture decouples ID generation from the application logic and enforces a single owner per namespace, ensuring linearizable ordering.
Architecture Overview
- Sequencer Service: A stateless gRPC service deployed per region. It acts as the sole interface for ID requests.
- Consensus Layer: A local etcd cluster manages leader election and state persistence. The service uses etcd's lease-based leadership to ensure only one node appends to the sequence at any time.
- Namespace Partitioning: IDs are scoped by namespace (e.g.,
voucher,receipt,reward). This isolates traffic and allows independent scaling and compaction policies. - Fencing via Revisions: etcd's global revision numbers act as implicit fencing tokens. Clients can verify the revision to ensure they are communicating with the current leader, preventing split-brain duplicates.
Implementation Details
The following TypeScript implementation demonstrates the server-side logic for a TokenGenerator service. This example highlights leader election, monotonic increment, and revision-based fencing.
1. gRPC Protocol Definition
syntax = "proto3";
package token.v1;
service TokenGenerator {
// Generates the next unique token for a specific namespace.
rpc NextToken (TokenRequest) returns (TokenResponse);
// Streams tokens for high-throughput batch scenarios.
rpc StreamTokens (StreamRequest) returns (stream TokenResponse);
}
message TokenRequest {
string namespace = 1;
int64 count = 2; // Request multiple tokens atomically
}
message TokenResponse {
string namespace = 1;
int64 start_token = 2;
int64 end_token = 3;
int64 revision = 4; // Fencing token for client verification
}
2. Sequencer Server Logic
import { Etcd3, Lease } from 'etcd3';
import { Server, ServerCredentials } from '@grpc/grpc-js';
import { TokenGeneratorService } from './generated/token_grpc_pb';
import { TokenRequest, TokenResponse } from './generated/token_pb';
export class SequencerServer {
private client: Etcd3;
private lease: Lease;
private isLeader: boolean = false;
private leaderId: string;
constructor(private region: string) {
this.client = new Etcd3();
this.leaderId = `${this.region}-sequencer-${process.pid}`;
}
async start() {
// Establish lease-based leadership
this.lease = this.client.lease('sequencer-leader', { TTL: 10 });
this.lease.on('granted', () => {
this.isLeader = true;
console.log('Leader elected. Serving requests.');
});
this.lease.on('lost', () => {
this.isLeader = false;
console.warn('Leadership lost. Stepping down.');
});
// Keep lease alive
this.lease.keepAlive();
// Start gRPC server...
}
async nextToken(call: any, callback: any) {
if (!this.isLeader) {
// Reject if not leader to prevent races
return callback({
code: 12, // UNIMPLEMENTED
message: 'Not the current leader'
});
}
const { namespace, count } = call.request;
const key = `seq:${namespace}`;
try {
// Atomic read-modify-write within Raft
const current = await this.client.get(key).number();
const nextStart = current || 0;
const nextEnd = nextStart + count;
// Update KV store; revision is returned automatically
const putResult = await this.client.put(key).value(nextEnd.toString());
const response = new TokenResponse();
response.setNamespace(namespace);
response.setStartToken(nextStart);
response.setEndToken(nextEnd);
response.setRevision(putResult.revision); // Fencing token
callback(null, response);
} catch (err) {
callback(err);
}
}
}
3. Client-Side Usage with Retry and Fencing
import { TokenGeneratorClient } from './generated/token_grpc_pb';
import { TokenRequest } from './generated/token_pb';
export class TokenClient {
private client: TokenGeneratorClient;
private lastRevision: number = 0;
constructor(endpoint: string) {
this.client = new TokenGeneratorClient(endpoint, grpc.credentials.createInsecure());
}
async acquireTokens(namespace: string, count: number): Promise<number[]> {
const request = new TokenRequest();
request.setNamespace(namespace);
request.setCount(count);
// Retry logic with exponential backoff
const maxRetries = 3;
for (let i = 0; i < maxRetries; i++) {
try {
const response = await this.client.nextToken(request);
// Fencing check: Ensure revision is monotonically increasing
// If revision < lastRevision, we hit an old leader or stale state
if (response.getRevision() < this.lastRevision) {
throw new Error('Stale revision detected. Possible split-brain.');
}
this.lastRevision = response.getRevision();
// Generate token range
const tokens: number[] = [];
for (let t = response.getStartToken(); t < response.getEndToken(); t++) {
tokens.push(t);
}
return tokens;
} catch (err) {
if (i === maxRetries - 1) throw err;
await this.sleep(Math.pow(2, i) * 100);
}
}
throw new Error('Max retries exceeded');
}
private sleep(ms: number) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
Architecture Rationale
- Why etcd over Redis? etcd provides strong consistency via Raft. Every write is replicated to a quorum before acknowledgment. This eliminates the race conditions inherent in Redis Cluster's slot-based sharding.
- Why gRPC? gRPC offers binary serialization and HTTP/2 multiplexing, reducing overhead compared to REST/JSON. This is critical for maintaining sub-3 ms latency.
- Why Lease-Based Leadership? Leases automatically expire if a node crashes, triggering immediate leader election without manual intervention. This prevents the "zombie leader" problem where a dead node holds a lock until a long timeout expires.
- Namespace Isolation: Partitioning by namespace prevents a thundering herd on a single key. It also allows for targeted compaction. If the
vouchernamespace generates tokens rapidly whilereceiptis idle, compaction can be tuned per namespace.
Pitfall Guide
Production deployments of sequencer services encounter specific failure modes. The following pitfalls are derived from real-world incidents and best practices.
| Pitfall | Explanation | Fix |
|---|---|---|
| Raft Log Bloat | High-throughput namespaces can cause the Raft log to grow rapidly, consuming disk and memory. In one instance, a namespace generated 2.1 GB of log data in 72 hours at 120k IDs/min. | Implement automated compaction. Schedule nightly snapshots and truncate the log to the last committed index. Monitor log size and trigger compaction if thresholds are breached. |
| Clock Drift in Locks | Algorithms like Redlock rely on wall-clock time for lease expiration. If nodes have clock drift (e.g., >50 ms), locks can expire prematurely or overlap, causing duplicates. | Avoid time-based distributed locks for critical state. Use lease-based consensus (etcd/ZooKeeper) where leadership is managed by the cluster, not client clocks. |
| Retry Storms | During leader election, clients may retry aggressively, overwhelming the new leader and prolonging the outage. | Implement exponential backoff with jitter on the client side. Add a circuit breaker to fail fast if the service is unavailable for an extended period. |
| Fencing Token Neglect | Without fencing tokens, a client might accept a response from an old leader that hasn't yet realized it has stepped down, leading to duplicate IDs. | Use etcd revisions as fencing tokens. Clients must verify that the revision in the response is greater than the previous revision. Reject responses with stale revisions. |
| Namespace Hotspots | Routing all traffic to a single namespace can create a bottleneck, limiting throughput to the Raft cluster's write capacity. | Design namespaces based on traffic patterns. If a namespace approaches Raft write limits, shard the namespace (e.g., voucher-01, voucher-02) and aggregate at the application layer, accepting a loss of global monotonicity within that shard. |
| Client State Staleness | Clients caching mapping tables (e.g., realm-to-slot) can become stale after cluster rebalancing, causing requests to hit wrong nodes. | Avoid client-side sharding maps for CP systems. Let the sequencer service handle routing. If client-side routing is necessary, implement a robust invalidation mechanism tied to cluster change events. |
| Misinterpreting AP Consistency | Assuming Redis Cluster provides atomicity for cross-slot operations leads to data corruption during rebalancing. | Reserve AP stores for caching and non-critical state. Use CP stores for any data requiring strict ordering, uniqueness, or transactional integrity. |
Production Bundle
Action Checklist
- Deploy etcd Cluster: Provision a 3-node etcd cluster per region with disk-backed storage. Ensure low-latency network between nodes.
- Configure Compaction: Set up a cron job or background process to compact the etcd store daily. Monitor disk usage to prevent log bloat.
- Implement gRPC Service: Build the sequencer service with lease-based leadership. Include health checks and metrics for leader changes.
- Add Fencing Logic: Ensure clients track and verify etcd revisions. Reject responses with stale revisions to prevent split-brain duplicates.
- Tune Retry Policies: Configure client retries with exponential backoff and jitter. Set appropriate timeouts for leader election scenarios.
- Monitor Latency and Errors: Track p99 latency, duplication rate, and leader election frequency. Alert on latency spikes or increased error rates.
- Test Partition Scenarios: Simulate network partitions and node failures to verify that the system recovers without generating duplicates.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Low Volume, Eventual Consistency OK | Redis Cluster | Low latency, simple setup. Duplicates are acceptable or handled downstream. | Low |
| High Volume, Strict Uniqueness Required | Raft Sequencer (etcd) | Provides linearizability with low latency. Best balance of performance and consistency. | Medium |
| Multi-Region ACID Transactions | FoundationDB / CockroachDB | Strong consistency across regions. Suitable for complex transactions, not just ID generation. | High |
| Batch ID Generation | Raft Sequencer with Streaming | gRPC streaming allows efficient batch requests, reducing round-trip overhead. | Medium |
Configuration Template
etcd Configuration (etcd.conf)
# etcd configuration for sequencer cluster
name: sequencer-node-1
data-dir: /var/lib/etcd
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://node1:2379
initial-cluster: sequencer-node-1=http://node1:2380,sequencer-node-2=http://node2:2380,sequencer-node-3=http://node3:2380
initial-cluster-token: sequencer-cluster
initial-advertise-peer-urls: http://node1:2380
listen-peer-urls: http://0.0.0.0:2380
# Compaction settings
auto-compaction-mode: periodic
auto-compaction-retention: "24h"
gRPC Server Configuration (grpc_server.yaml)
server:
port: 50051
max_concurrent_streams: 100
keepalive_time_ms: 30000
keepalive_timeout_ms: 10000
etcd:
endpoints:
- http://node1:2379
- http://node2:2379
- http://node3:2379
lease_ttl: 10
election_timeout_ms: 100
logging:
level: info
format: json
Quick Start Guide
- Initialize etcd Cluster: Deploy three etcd nodes and verify cluster health using
etcdctl endpoint health. - Build Sequencer Service: Compile the gRPC service and configure it to connect to the etcd cluster. Set the region and namespace parameters.
- Start Service: Launch the service and verify that a leader is elected. Check logs for "Leader elected" messages.
- Test Client: Run a client script to request tokens. Verify that tokens are monotonically increasing and revisions are valid.
- Validate Consistency: Simulate a leader failure by stopping the leader node. Ensure the service recovers within the election timeout and continues generating unique tokens without duplicates.
This architecture provides a robust, scalable solution for ID generation in distributed systems. By leveraging Raft consensus and proper fencing mechanisms, teams can eliminate duplicate generation risks while maintaining the low latency required for high-throughput applications.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
