Why Veltrix Thought It Could Buy Its Way Out of a Distributed Lock Problem

Architecting Monotonic ID Generation: From Redis Races to Raft-Based Sequencers

Current Situation Analysis

High-throughput event systems frequently face a deceptively simple requirement: generate unique, monotonically increasing identifiers for non-fungible assets. Whether these are voucher codes, transaction receipts, or reward tokens, the business logic often demands strict linearizability. A duplicate ID is not merely a bug; it can represent financial loss, data corruption, or a violation of user trust.

The industry pain point lies in the mismatch between performance expectations and consistency models. Engineering teams routinely default to distributed caches like Redis Cluster for ID generation due to their low latency and familiarity. However, Redis Cluster is an AP (Availability/Partition Tolerance) system. It provides eventual consistency within shard slots and cannot guarantee atomic cross-slot operations. When clusters rebalance or experience network partitions, the assumption of a single atomic register breaks down.

This misunderstanding leads to severe production incidents. In a documented case involving a global event engine, the reliance on Redis Cluster resulted in race conditions during slot rebalancing. The system emitted duplicate keys at a rate of 0.04%, necessitating the manual backfilling of 14,000 items in the account database. The root cause was not the database itself, but the architectural error of treating an eventually consistent store as a linearizable counter under partial failure. Optimistic locking strategies without fencing tokens allowed concurrent clients to believe they held exclusive access, compounding the duplication rate.

WOW Moment: Key Findings

The transition from AP caches to CP (Consistency/Partition Tolerance) sequencers reveals a critical trade-off landscape. While CP systems introduce slightly higher latency, they eliminate the catastrophic cost of duplicate generation and manual recovery. The following data compares four architectural approaches evaluated during the resolution of high-velocity ID generation failures.

Approach	p99 Latency	Duplication Risk	Operational Complexity	Cost per 1M IDs
Redis Cluster	~2.0 ms	High (Races on rebalance)	Low	$0.00005
Redlock (AP)	~5.0 ms	Medium (Clock drift failures)	Medium	$0.00008
FoundationDB	~300 ms+	Near Zero	High	$0.00050
Raft Sequencer	2.9 ms	Near Zero	Medium	$0.00012

Why this matters: The Raft-based sequencer (implemented via etcd) achieves the "sweet spot" for ID generation. It delivers latency comparable to Redis (2.9 ms vs 2.0 ms) while providing strict consistency guarantees that prevent duplicates. FoundationDB, while consistent, introduced unacceptable latency spikes (300 ms+) and retry storms under load, making it unsuitable for sub-5 ms SLAs. The data confirms that for ID generation, a lightweight CP consensus layer is superior to both AP caches and heavy-weight distributed SQL engines.

Core Solution

The robust solution is a dedicated Sequencer Service backed by a Raft consensus cluster. This architecture decouples ID generation from the application logic and enforces a single owner per namespace, ensuring linearizable ordering.

Architecture Overview

Sequencer Service: A stateless gRPC service deployed per region. It acts as the sole interface for ID requests.
Consensus Layer: A local etcd cluster manages leader election and state persistence. The service uses etcd's lease-based leadership to ensure only one node appends to the sequence at any time.
Namespace Partitioning: IDs are scoped by namespace (e.g., voucher, receipt, reward). This isolates traffic and allows independent scaling and compaction policies.
Fencing via Revisions: etcd's global revision numbers act as implicit fencing tokens. Clients can verify the revision to ensure they are communicating with the current leader, preventing split-brain duplicates.

Implementation Details

The following TypeScript implementation demonstrates the server-side logic for a TokenGenerator service. This example highlights leader election, monotonic increment, and revision-based fencing.

1. gRPC Protocol Definition

syntax = "proto3";

package token.v1;

service TokenGenerator {
  // Generates the next unique token for a specific namespace.
  rpc NextToken (TokenRequest) returns (TokenResponse);
  
  // Streams tokens for high-throughput batch scenarios.
  rpc StreamTokens (StreamRequest) returns (stream TokenResponse);
}

message TokenRequest {
  string namespace = 1;
  int64 count = 2; // Request multiple tokens atomically
}

message TokenResponse {
  string namespace = 1;
  int64 start_token = 2;
  int64 end_token = 3;
  int64 revision = 4; // Fencing token for client verification
}

2. Sequencer Server Logic

import { Etcd3, Lease } from 'etcd3';
import { Server, ServerCredentials } from '@grpc/grpc-js';
import { TokenGeneratorService } from './generated/token_grpc_pb';
import { TokenRequest, TokenResponse } from './generated/token_pb';

export class SequencerServer {
  private client: Etcd3;
  private lease: Lease;
  private isLeader: boolean = false;
  private leaderId: string;

  constructor(private region: string) {
    this.client = new Etcd3();
    this.leaderId = `${this.region}-sequencer-${process.pid}`;
  }

  async start() {
    // Establish lease-based leadership
    this.lease = this.client.lease('sequencer-leader', { TTL: 10 });
    
    this.lease.on('granted', () => {
      this.isLeader = true;
      console.log('Leader elected. Serving requests.');
    });

    this.lease.on('lost', () => {
      this.isLeader = false;
      console.warn('Leadership lost. Stepping down.');
    });

    // Keep lease alive
    this.lease.keepAlive();
    
    // Start gRPC server...
  }

  async nextToken(call: any, callback: any) {
    if (!this.isLeader) {
      // Reject if not leader to prevent races
      return callback({
        code: 12, // UNIMPLEMENTED
        message: 'Not the current leader'
      });
    }

    const { namespace, count } = call.request;
    const key = `seq:${namespace}`;

    try {
      // Atomic read-modify-write within Raft
      const current = await this.client.get(key).number();
      const nextStart = current || 0;
      const nextEnd = nextStart + count;

      // Update KV store; revision is returned automatically
      const putResult = await this.client.put(key).value(nextEnd.toString());
      
      const response = new TokenResponse();
      response.setNamespace(namespace);
      response.setStartToken(nextStart);
      response.setEndToken(nextEnd);
      response.setRevision(putResult.revision); // Fencing token

      callback(null, response);
    } catch (err) {
      callback(err);
    }
  }
}

3. Client-Side Usage with Retry and Fencing

import { TokenGeneratorClient } from './generated/token_grpc_pb';
import { TokenRequest } from './generated/token_pb';

export class TokenClient {
  private client: TokenGeneratorClient;
  private lastRevision: number = 0;

  constructor(endpoint: string) {
    this.client = new TokenGeneratorClient(endpoint, grpc.credentials.createInsecure());
  }

  async acquireTokens(namespace: string, count: number): Promise<number[]> {
    const request = new TokenRequest();
    request.setNamespace(namespace);
    request.setCount(count);

    // Retry logic with exponential backoff
    const maxRetries = 3;
    for (let i = 0; i < maxRetries; i++) {
      try {
        const response = await this.client.nextToken(request);
        
        // Fencing check: Ensure revision is monotonically increasing
        // If revision < lastRevision, we hit an old leader or stale state
        if (response.getRevision() < this.lastRevision) {
          throw new Error('Stale revision detected. Possible split-brain.');
        }
        this.lastRevision = response.getRevision();

        // Generate token range
        const tokens: number[] = [];
        for (let t = response.getStartToken(); t < response.getEndToken(); t++) {
          tokens.push(t);
        }
        return tokens;
      } catch (err) {
        if (i === maxRetries - 1) throw err;
        await this.sleep(Math.pow(2, i) * 100);
      }
    }
    throw new Error('Max retries exceeded');
  }

  private sleep(ms: number) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Architecture Rationale

Why etcd over Redis? etcd provides strong consistency via Raft. Every write is replicated to a quorum before acknowledgment. This eliminates the race conditions inherent in Redis Cluster's slot-based sharding.
Why gRPC? gRPC offers binary serialization and HTTP/2 multiplexing, reducing overhead compared to REST/JSON. This is critical for maintaining sub-3 ms latency.
Why Lease-Based Leadership? Leases automatically expire if a node crashes, triggering immediate leader election without manual intervention. This prevents the "zombie leader" problem where a dead node holds a lock until a long timeout expires.
Namespace Isolation: Partitioning by namespace prevents a thundering herd on a single key. It also allows for targeted compaction. If the voucher namespace generates tokens rapidly while receipt is idle, compaction can be tuned per namespace.

Pitfall Guide

Production deployments of sequencer services encounter specific failure modes. The following pitfalls are derived from real-world incidents and best practices.

Pitfall	Explanation	Fix
Raft Log Bloat	High-throughput namespaces can cause the Raft log to grow rapidly, consuming disk and memory. In one instance, a namespace generated 2.1 GB of log data in 72 hours at 120k IDs/min.	Implement automated compaction. Schedule nightly snapshots and truncate the log to the last committed index. Monitor log size and trigger compaction if thresholds are breached.
Clock Drift in Locks	Algorithms like Redlock rely on wall-clock time for lease expiration. If nodes have clock drift (e.g., >50 ms), locks can expire prematurely or overlap, causing duplicates.	Avoid time-based distributed locks for critical state. Use lease-based consensus (etcd/ZooKeeper) where leadership is managed by the cluster, not client clocks.
Retry Storms	During leader election, clients may retry aggressively, overwhelming the new leader and prolonging the outage.	Implement exponential backoff with jitter on the client side. Add a circuit breaker to fail fast if the service is unavailable for an extended period.
Fencing Token Neglect	Without fencing tokens, a client might accept a response from an old leader that hasn't yet realized it has stepped down, leading to duplicate IDs.	Use etcd revisions as fencing tokens. Clients must verify that the revision in the response is greater than the previous revision. Reject responses with stale revisions.
Namespace Hotspots	Routing all traffic to a single namespace can create a bottleneck, limiting throughput to the Raft cluster's write capacity.	Design namespaces based on traffic patterns. If a namespace approaches Raft write limits, shard the namespace (e.g., `voucher-01`, `voucher-02`) and aggregate at the application layer, accepting a loss of global monotonicity within that shard.
Client State Staleness	Clients caching mapping tables (e.g., realm-to-slot) can become stale after cluster rebalancing, causing requests to hit wrong nodes.	Avoid client-side sharding maps for CP systems. Let the sequencer service handle routing. If client-side routing is necessary, implement a robust invalidation mechanism tied to cluster change events.
Misinterpreting AP Consistency	Assuming Redis Cluster provides atomicity for cross-slot operations leads to data corruption during rebalancing.	Reserve AP stores for caching and non-critical state. Use CP stores for any data requiring strict ordering, uniqueness, or transactional integrity.

Production Bundle

Action Checklist

Deploy etcd Cluster: Provision a 3-node etcd cluster per region with disk-backed storage. Ensure low-latency network between nodes.
Configure Compaction: Set up a cron job or background process to compact the etcd store daily. Monitor disk usage to prevent log bloat.
Implement gRPC Service: Build the sequencer service with lease-based leadership. Include health checks and metrics for leader changes.
Add Fencing Logic: Ensure clients track and verify etcd revisions. Reject responses with stale revisions to prevent split-brain duplicates.
Tune Retry Policies: Configure client retries with exponential backoff and jitter. Set appropriate timeouts for leader election scenarios.
Monitor Latency and Errors: Track p99 latency, duplication rate, and leader election frequency. Alert on latency spikes or increased error rates.
Test Partition Scenarios: Simulate network partitions and node failures to verify that the system recovers without generating duplicates.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Low Volume, Eventual Consistency OK	Redis Cluster	Low latency, simple setup. Duplicates are acceptable or handled downstream.	Low
High Volume, Strict Uniqueness Required	Raft Sequencer (etcd)	Provides linearizability with low latency. Best balance of performance and consistency.	Medium
Multi-Region ACID Transactions	FoundationDB / CockroachDB	Strong consistency across regions. Suitable for complex transactions, not just ID generation.	High
Batch ID Generation	Raft Sequencer with Streaming	gRPC streaming allows efficient batch requests, reducing round-trip overhead.	Medium

Configuration Template

etcd Configuration (etcd.conf)

# etcd configuration for sequencer cluster
name: sequencer-node-1
data-dir: /var/lib/etcd
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://node1:2379
initial-cluster: sequencer-node-1=http://node1:2380,sequencer-node-2=http://node2:2380,sequencer-node-3=http://node3:2380
initial-cluster-token: sequencer-cluster
initial-advertise-peer-urls: http://node1:2380
listen-peer-urls: http://0.0.0.0:2380

# Compaction settings
auto-compaction-mode: periodic
auto-compaction-retention: "24h"

gRPC Server Configuration (grpc_server.yaml)

server:
  port: 50051
  max_concurrent_streams: 100
  keepalive_time_ms: 30000
  keepalive_timeout_ms: 10000

etcd:
  endpoints:
    - http://node1:2379
    - http://node2:2379
    - http://node3:2379
  lease_ttl: 10
  election_timeout_ms: 100

logging:
  level: info
  format: json

Quick Start Guide

Initialize etcd Cluster: Deploy three etcd nodes and verify cluster health using etcdctl endpoint health.
Build Sequencer Service: Compile the gRPC service and configure it to connect to the etcd cluster. Set the region and namespace parameters.
Start Service: Launch the service and verify that a leader is elected. Check logs for "Leader elected" messages.
Test Client: Run a client script to request tokens. Verify that tokens are monotonically increasing and revisions are valid.
Validate Consistency: Simulate a leader failure by stopping the leader node. Ensure the service recovers within the election timeout and continues generating unique tokens without duplicates.

This architecture provides a robust, scalable solution for ID generation in distributed systems. By leveraging Raft consensus and proper fencing mechanisms, teams can eliminate duplicate generation risks while maintaining the low latency required for high-throughput applications.

Mid-Year Sale — Unlock Full Article