that enforces session consistency and handles replica degradation.
Architecture Decisions
- Router Layer: The router sits between the application and the database driver. It intercepts queries, classifies them, and selects the appropriate connection.
- Lag Monitoring: The router maintains a rolling average of replication lag for each replica. Lag is determined by comparing the primary's write position (e.g., LSN or GTID) with the replica's applied position.
- Session Affinity: To support "read-your-writes," the router tracks the last write position for a session. Reads following a write are routed to the primary until the replica catches up, or forced to the primary if the lag threshold is breached.
- Fallback Circuit: If a replica's lag exceeds a critical threshold or the node becomes unreachable, the circuit opens, and traffic is rerouted to the primary or another healthy replica.
Implementation: Lag-Aware Database Router
import { Pool, PoolClient, QueryResult } from 'pg';
interface DbNode {
id: string;
pool: Pool;
type: 'primary' | 'replica';
lagThresholdMs: number;
currentLagMs: number;
lastHealthCheck: Date;
isHealthy: boolean;
}
interface RoutingConfig {
nodes: DbNode[];
globalLagThresholdMs: number;
sessionConsistency: 'strong' | 'eventual';
fallbackToPrimaryOnLag: boolean;
}
export class ReadWriteRouter {
private config: RoutingConfig;
private sessionWritePositions: Map<string, number> = new Map();
constructor(config: RoutingConfig) {
this.config = config;
this.initializeHealthChecks();
}
private async initializeHealthChecks() {
// In production, use a dedicated interval to query pg_stat_replication
// or equivalent to update currentLagMs for each node.
setInterval(async () => {
for (const node of this.config.nodes) {
if (node.type === 'replica') {
node.currentLagMs = await this.measureReplicationLag(node);
node.isHealthy = node.currentLagMs < this.config.globalLagThresholdMs;
}
}
}, 1000);
}
private async measureReplicationLag(node: DbNode): Promise<number> {
// Implementation depends on DB engine.
// PostgreSQL: SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000;
return 0; // Placeholder
}
public async getConnection(
sessionId: string,
queryType: 'read' | 'write',
requiresConsistency: boolean = false
): Promise<PoolClient> {
if (queryType === 'write') {
return this.getPrimaryConnection();
}
// Read path logic
const lastWritePos = this.sessionWritePositions.get(sessionId) || 0;
const now = Date.now();
// Session consistency check: if recent write occurred, enforce consistency
if (this.config.sessionConsistency === 'strong' || requiresConsistency) {
// Route to primary if session has recent writes or consistency is mandatory
if (lastWritePos > 0 && (now - lastWritePos) < 5000) {
return this.getPrimaryConnection();
}
}
// Select best replica
const replica = this.selectBestReplica();
if (replica) {
return replica.pool.connect();
}
// Fallback: No healthy replicas
if (this.config.fallbackToPrimaryOnLag) {
return this.getPrimaryConnection();
}
throw new Error('No healthy read replicas available and fallback disabled');
}
private selectBestReplica(): DbNode | null {
const healthyReplicas = this.config.nodes
.filter(n => n.type === 'replica' && n.isHealthy);
if (healthyReplicas.length === 0) return null;
// Load balancing: weighted random or least connections
// Here we select the replica with lowest lag
return healthyReplicas.reduce((best, current) =>
current.currentLagMs < best.currentLagMs ? current : best
);
}
private async getPrimaryConnection(): Promise<PoolClient> {
const primary = this.config.nodes.find(n => n.type === 'primary');
if (!primary) throw new Error('Primary node not configured');
return primary.pool.connect();
}
public recordWrite(sessionId: string) {
this.sessionWritePositions.set(sessionId, Date.now());
}
public releaseConnection(client: PoolClient, error?: Error) {
if (error) {
client.release(error);
} else {
client.release();
}
}
}
Usage Example
const router = new ReadWriteRouter({
nodes: [
{ id: 'primary', pool: new Pool({ host: 'db-primary' }), type: 'primary', lagThresholdMs: 0, currentLagMs: 0, lastHealthCheck: new Date(), isHealthy: true },
{ id: 'replica-1', pool: new Pool({ host: 'db-replica-1' }), type: 'replica', lagThresholdMs: 500, currentLagMs: 0, lastHealthCheck: new Date(), isHealthy: true },
{ id: 'replica-2', pool: new Pool({ host: 'db-replica-2' }), type: 'replica', lagThresholdMs: 500, currentLagMs: 0, lastHealthCheck: new Date(), isHealthy: true },
],
globalLagThresholdMs: 1000,
sessionConsistency: 'strong',
fallbackToPrimaryOnLag: true,
});
async function handleUserUpdate(userId: string, data: any) {
const session = userId;
// Write
const writeClient = await router.getConnection(session, 'write');
try {
await writeClient.query('UPDATE users SET data = $1 WHERE id = $2', [data, userId]);
router.recordWrite(session); // Mark session for consistency
} finally {
router.releaseConnection(writeClient);
}
// Immediate Read (will route to primary due to session consistency)
const readClient = await router.getConnection(session, 'read', true);
try {
const result = await readClient.query('SELECT * FROM users WHERE id = $1', [userId]);
return result.rows[0];
} finally {
router.releaseConnection(readClient);
}
}
Pitfall Guide
-
Ignoring Replication Lag in Critical Reads:
Sending a read query to a replica immediately after a write results in stale data. This breaks user expectations (e.g., profile update not reflecting instantly). Always implement session consistency or force primary routing for critical post-write reads.
-
Executing Locking Queries on Replicas:
Queries like SELECT ... FOR UPDATE or LOCK TABLE require write access. Routing these to read replicas causes immediate errors. The router must parse SQL or rely on ORM hints to detect locking clauses and route them to the primary.
-
Connection Pool Imbalance:
Replicas often receive significantly higher connection counts than the primary. If pool sizes are identical, replicas may exhaust connections during traffic spikes. Configure larger pools for replicas and smaller, protected pools for the primary to reserve capacity for writes.
-
Blind Fallback to Primary:
When replicas fail, automatic fallback to the primary can overwhelm the primary instance, causing a cascading failure that takes down the entire database. Implement circuit breakers that degrade gracefully (e.g., serve cached data or reject reads) rather than flooding the primary.
-
Replication Lag Monitoring Blindness:
Relying solely on Seconds_Behind_Master is insufficient. This metric can report 0 even when replication is stalled if the SQL thread is idle. Monitor GTID positions or WAL lag bytes to detect true replication health.
-
DNS Caching Issues:
If using DNS-based routing for replicas, TTL settings can delay failover. Clients may continue hitting dead replicas based on cached DNS records. Use short TTLs or service discovery mechanisms for dynamic replica management.
-
Transaction Spanning Nodes:
In multi-statement transactions, mixing reads and writes can cause errors if the transaction starts on a replica and attempts a write, or vice versa. Ensure transaction boundaries are respected; once a transaction begins on a node, all subsequent statements in that transaction must route to the same node.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Financial Transactions | App-Level Routing + Strong Consistency | Zero tolerance for stale data; strict session affinity required. | High (Dev effort for routing logic) |
| Social Media Feed | Replica + Cache-Aside | High read volume; stale data is acceptable; latency is critical. | Low (Infra cost for replicas/cache) |
| Legacy Monolith | Proxy-Based Split (e.g., ProxySQL) | Minimal code changes; quick win for read scaling. | Medium (Proxy infra + latency overhead) |
| Global Distribution | Geo-Replicated Read Nodes + DNS/GSLB | Reduces latency for international users; splits traffic by region. | High (Multi-region infra cost) |
| Bursty Analytics | Dedicated Analytics Replica | Isolates heavy analytical queries from OLTP workload. | Medium (Additional replica cost) |
Configuration Template
# database_router.yaml
router:
session_consistency: strong
fallback_to_primary: true
lag_check_interval_ms: 1000
global_lag_threshold_ms: 1000
nodes:
primary:
host: "db-primary.internal"
port: 5432
pool_size: 50
max_overflow: 10
type: primary
replicas:
- id: "replica-us-east-1"
host: "db-replica-1.internal"
port: 5432
pool_size: 200
max_overflow: 50
weight: 60
lag_threshold_ms: 500
- id: "replica-us-west-2"
host: "db-replica-2.internal"
port: 5432
pool_size: 200
max_overflow: 50
weight: 40
lag_threshold_ms: 500
circuit_breaker:
failure_threshold: 5
recovery_timeout_ms: 30000
fallback_strategy: primary
Quick Start Guide
- Provision Replica: Create a read replica of your production database. Ensure network connectivity and replication is active.
- Install Router Library: Integrate a database routing library (e.g.,
pg-router for Node.js, or custom implementation as shown) into your application dependencies.
- Configure Endpoints: Update your configuration file with primary and replica connection strings, pool sizes, and lag thresholds based on the template.
- Enable Lag Monitoring: Deploy the lag monitoring agent or configure the router's health check interval to track replica status.
- Deploy and Verify: Deploy the updated application. Monitor routing metrics to confirm read traffic is hitting replicas and that fallback mechanisms trigger correctly during simulated lag.