aga + Circuit Breaker | 4.7 | 1.1 | 1.8s | 38 |
Interpretation:
- Deployment frequency increases when services decouple via asynchronous patterns and bounded failure domains.
- Blast radius shrinks because circuit breakers and saga compensations isolate dependency failures.
- Consistency latency drops due to transactional outboxes eliminating polling overhead and enabling direct publish-on-commit.
- Operational overhead collapses as idempotency, dead-letter routing, and standardized retry policies replace manual reconciliation.
Patterns are not theoretical. They are measurable engineering multipliers.
Core Solution
We implement a production-grade distributed architecture using three interlocking patterns: Transactional Outbox, Event-Driven Saga, and Circuit Breaker. This combination solves consistency, coordination, and resilience simultaneously.
Step 1: Transactional Outbox Pattern
The outbox pattern eliminates distributed transactions by recording domain events in the same database transaction as the state change. A separate process (poller or CDC) publishes events to the message broker.
Architecture Decision: Strong consistency at the aggregate boundary, eventual consistency across services. No two-phase commit.
Implementation (TypeScript/Node.js):
// 1. Outbox table schema (PostgreSQL)
// CREATE TABLE outbox (
// id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
// aggregate_type VARCHAR(50) NOT NULL,
// aggregate_id UUID NOT NULL,
// event_type VARCHAR(100) NOT NULL,
// payload JSONB NOT NULL,
// created_at TIMESTAMPTZ DEFAULT NOW(),
// published BOOLEAN DEFAULT FALSE
// );
// 2. Transactional write with outbox
async function createOrder(db: Pool, order: OrderDTO): Promise<void> {
const client = await db.connect();
try {
await client.query('BEGIN');
// Domain state mutation
await client.query(
'INSERT INTO orders (id, customer_id, total) VALUES ($1, $2, $3)',
[order.id, order.customerId, order.total]
);
// Outbox event record (same transaction)
await client.query(
`INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ($1, $2, $3, $4)`,
['order', order.id, 'OrderCreated', JSON.stringify(order)]
);
await client.query('COMMIT');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
Poller/CDC Strategy: Use Debezium or a lightweight background worker that queries published = FALSE, batches publishes, and updates the flag. Idempotency is enforced via id uniqueness in the consumer.
Step 2: Event-Driven Saga Pattern
Sagas coordinate business transactions across services without distributed locks. We use choreography for decoupled workflows and orchestration for complex state machines.
Architecture Decision: Choreography for linear workflows (Order β Inventory β Payment). Orchestration for branching/rollback-heavy flows (Subscription β Billing β Provisioning).
Choreography Implementation:
// Inventory Service: consumes OrderCreated, emits InventoryReserved or InventoryFailed
async function handleOrderCreated(event: OutboxEvent, db: Pool, publisher: EventBus) {
const { aggregate_id: orderId, payload } = event;
const order = payload as OrderDTO;
try {
await db.query('BEGIN');
await reserveInventory(db, order.items);
await db.query('COMMIT');
await publisher.publish('InventoryReserved', {
correlation_id: orderId,
items_reserved: order.items
});
} catch (err) {
await db.query('ROLLBACK');
await publisher.publish('InventoryFailed', {
correlation_id: orderId,
reason: err.message
});
}
}
// Compensation: Order service consumes InventoryFailed, emits OrderCancelled
async function handleInventoryFailed(event: OutboxEvent, publisher: EventBus) {
await publisher.publish('OrderCancelled', {
correlation_id: event.correlation_id,
reason: event.payload.reason
});
}
Idempotency Enforcement: Consumers must deduplicate using correlation_id + event_type. Store processed event IDs in a dedicated table or use broker-level deduplication (e.g., Kafka exactly-once, RabbitMQ publisher confirms).
Step 3: Circuit Breaker Pattern
Circuit breakers prevent cascading failures by failing fast when a dependency degrades. States: CLOSED (normal), OPEN (fail fast), HALF_OPEN (test recovery).
Architecture Decision: Apply at service-to-service HTTP/gRPC boundaries and external API calls. Not for internal event consumption.
Implementation (Generic Resilience Config):
circuit_breaker:
service: payment-gateway
failure_threshold: 5 # consecutive failures to open
recovery_timeout: 30s # wait before half-open
permitted_calls_in_half_open: 3
sliding_window_size: 10 # requests to evaluate
minimum_calls: 5 # minimum requests before evaluating
slow_call_duration_threshold: 2s
slow_call_rate_threshold: 0.8 # 80% slow calls trigger open
Runtime Behavior:
CLOSED: Requests pass through. Failures increment counter.
- Threshold met β
OPEN: All requests fail immediately. Fallback executes.
- After
recovery_timeout β HALF_OPEN: Limited requests allowed. Success β CLOSED. Failure β OPEN.
Architecture Decisions Summary
| Decision | Rationale |
|---|
| Outbox over 2PC | Avoids distributed locking, scales horizontally, survives broker outages |
| Choreography over Orchestration | Lower coupling, easier horizontal scaling, simpler debugging |
| Idempotent Consumers | Network retries are inevitable; state must converge regardless of delivery count |
| Circuit Breakers at Boundaries | Internal async flows use backpressure; external sync calls need fast failure |
| Event Schema Versioning | Backward-compatible fields prevent consumer breakage during deployments |
Pitfall Guide
-
Treating Distributed Transactions Like Local Ones
Assuming BEGIN/COMMIT spans services leads to lock contention and partial rollbacks. Use outbox + saga instead.
-
Skipping Idempotency in Consumers
At-least-once delivery is standard. Without deduplication, duplicate events cause double charges, inventory oversell, and state corruption.
-
Misconfiguring Circuit Breaker Thresholds
Too aggressive: healthy services flap open/closed. Too lenient: thread pools exhaust before breaker triggers. Base thresholds on p95 latency and historical failure rates.
-
Using Events as RPC Calls
Publishing an event and immediately waiting for a synchronous response defeats async decoupling. Events represent state changes, not queries.
-
Ignoring Clock Skew in Ordering
Distributed systems lack a global clock. Rely on logical timestamps (Lamport/vector clocks) or broker-provided offsets for event ordering, not wall-clock time.
-
Over-Engineering Prematurely
Applying saga, CQRS, and event sourcing to a CRUD service adds complexity without benefit. Start with outbox + circuit breaker. Add patterns only when failure modes demand them.
-
Neglecting Dead-Letter Queues (DLQ)
Poison messages block consumer progress. Route malformed, unparseable, or repeatedly failing events to a DLQ with alerting and manual replay capabilities.
Production Bundle
Action Checklist
Decision Matrix
| Pattern | Use When | Avoid When | Consistency Model | Operational Cost |
|---|
| Transactional Outbox | Cross-service state mutation, broker reliability uncertain | Single-service CRUD, strict ACID required | Eventual | Low |
| Event-Driven Saga | Multi-step business workflow, partial failure tolerance | Simple linear transaction, real-time rollback needed | Eventual | Medium |
| Circuit Breaker | External dependencies, synchronous service calls | Internal async event flows, idempotent retries | N/A (resilience) | Low |
| 2PC / XA | Legacy monolith migration, regulatory compliance | Cloud-native, high-throughput, horizontal scaling | Strong | High |
| Polling / Retry | Temporary fallback, non-critical paths | Production data pipelines, financial operations | Eventually (delayed) | Medium |
Configuration Template
Copy-paste ready for production deployment.
Outbox Table (PostgreSQL):
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(50) NOT NULL,
aggregate_id UUID NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
published BOOLEAN DEFAULT FALSE,
version INT DEFAULT 1
);
CREATE INDEX idx_outbox_published ON outbox (published, created_at);
Consumer Idempotency Table:
CREATE TABLE processed_events (
event_id UUID PRIMARY KEY,
event_type VARCHAR(100) NOT NULL,
processed_at TIMESTAMPTZ DEFAULT NOW()
);
Retry Policy (YAML):
retry_policy:
max_attempts: 3
initial_interval: 1s
max_interval: 30s
multiplier: 2.0
jitter: true
retryable_exceptions:
- "NetworkTimeout"
- "ServiceUnavailable"
- "DeadlockDetected"
non_retryable_exceptions:
- "ValidationError"
- "AuthenticationFailed"
Quick Start Guide
- Add Outbox Schema: Execute the PostgreSQL schema above. Modify your primary write path to insert domain events into
outbox within the same transaction as state changes.
- Deploy Outbox Publisher: Run a lightweight worker or enable CDC (Debezium/WAL-G). Query
published = FALSE, batch 50-200 events, publish to broker, mark published = TRUE. Implement exponential backoff on broker failures.
- Implement Consumer Idempotency: Wrap event handlers in a transaction that checks
processed_events. If event_id exists, return success. Otherwise, process, write state, insert processed_events, commit.
- Add Circuit Breakers: Wrap all synchronous external calls with the provided YAML config. Implement fallback responses (cache, default, queued request). Monitor
OPEN state transitions in your observability stack.
- Validate with Failure Injection: Use chaos engineering tools (Toxiproxy, Litmus, Gremlin) to simulate latency, packet loss, and broker downtime. Verify outbox retries, circuit breaker transitions, and saga compensations execute without data corruption.
Distributed system patterns are not optional abstractions. They are the engineering contracts that guarantee consistency, resilience, and observability when networks fail, brokers lag, and dependencies degrade. Implement them deliberately, measure their impact, and treat them as first-class infrastructure. Your production environment will reflect the difference.