// Immediate handoff to broker
await eventRouter.enqueue({
topic,
shopDomain,
eventId,
payload: req.body,
receivedAt: Date.now()
});
// Return before any downstream work begins
res.status(200).json({ status: 'accepted' });
});
function verifyPayloadSignature(rawBody: Buffer, receivedHmac: string, secret: string): boolean {
const hash = crypto.createHmac('sha256', secret).update(rawBody).digest('base64');
return crypto.timingSafeEqual(Buffer.from(hash), Buffer.from(receivedHmac));
}
**Architecture Rationale:** Returning within 50ms guarantees compliance with Shopify’s timeout. HMAC verification uses `timingSafeEqual` to prevent timing attacks. The broker acts as a single ingestion point, enabling centralized logging, metrics, and backpressure handling.
### Step 2: Implement Tiered Queue Topology
A single queue forces all events to share the same retry policy, concurrency limits, and failure handling. This creates cross-contamination: a slow fulfillment API can block order processing, which in turn blocks inventory updates.
```typescript
import { Queue } from 'bullmq';
import { redisConnection } from '../infrastructure/redis';
export const ingestionBroker = new Queue('shopify:ingestion', { connection: redisConnection });
export const domainProcessors = {
orders: new Queue('shopify:orders', {
connection: redisConnection,
defaultJobOptions: { attempts: 5, backoff: { type: 'exponential', delay: 120000 } }
}),
catalog: new Queue('shopify:catalog', {
connection: redisConnection,
defaultJobOptions: { attempts: 3, backoff: { type: 'fixed', delay: 30000 } }
}),
logistics: new Queue('shopify:logistics', {
connection: redisConnection,
defaultJobOptions: { attempts: 4, backoff: { type: 'exponential', delay: 60000 } }
})
};
export const outboundNotifier = new Queue('shopify:notifications', {
connection: redisConnection,
defaultJobOptions: { attempts: 2, backoff: { type: 'fixed', delay: 15000 }, priority: 10 }
});
Architecture Rationale: Domain isolation prevents cascade failures. Order processing receives aggressive retry policies because revenue impact is high. Notifications receive lighter policies because they are non-critical. Each queue can scale independently based on consumer worker count and concurrency settings.
Step 3: Apply Saga Pattern for Multi-Step Workflows
Workflows spanning multiple external systems require explicit compensation logic. Without it, a mid-process failure leaves the system in an inconsistent state with no automated recovery path.
import { EventEmitter } from 'events';
import { domainProcessors, outboundNotifier } from './queues';
const workflowCoordinator = new EventEmitter();
// Phase 1: Reserve inventory
workflowCoordinator.on('order.ingested', async (payload) => {
try {
const reservationId = await inventoryService.reserve(payload.shopDomain, payload.lineItems);
workflowCoordinator.emit('inventory.reserved', { ...payload, reservationId });
} catch (err) {
workflowCoordinator.emit('inventory.failed', payload);
}
});
// Phase 2: Submit to fulfillment provider
workflowCoordinator.on('inventory.reserved', async (payload) => {
try {
const shipmentRef = await fulfillmentProvider.submit(payload.shopDomain, payload.lineItems);
await outboundNotifier.add('order.confirmed', { shop: payload.shopDomain, ref: shipmentRef });
workflowCoordinator.emit('fulfillment.submitted', payload);
} catch (err) {
// Compensating transaction: release reservation
await inventoryService.release(payload.shopDomain, payload.reservationId);
workflowCoordinator.emit('fulfillment.failed', payload);
}
});
// Phase 3: Handle failures gracefully
workflowCoordinator.on('inventory.failed', async (payload) => {
await outboundNotifier.add('order.rejected', { shop: payload.shopDomain, reason: 'insufficient_stock' });
});
workflowCoordinator.on('fulfillment.failed', async (payload) => {
await outboundNotifier.add('order.rejected', { shop: payload.shopDomain, reason: 'fulfillment_error' });
});
Architecture Rationale: Each phase is an independent event handler. Failures trigger explicit rollback operations (e.g., inventoryService.release). This guarantees that partial state never persists without a recovery path. The event-driven model also allows adding new phases (e.g., tax calculation, fraud screening) without modifying existing logic.
Step 4: Enforce Idempotency at the Worker Layer
Shopify guarantees at-least-once delivery. Network retries, worker crashes, and queue redeliveries mean the same payload will be processed multiple times. Application-level idempotency is mandatory.
import { Redis } from 'ioredis';
import { redisClient } from '../infrastructure/redis';
export async function executeWithIdempotency<T>(
shopDomain: string,
eventId: string,
operation: () => Promise<T>
): Promise<{ status: 'executed' | 'duplicate'; result?: T }> {
const idempotencyKey = `exec:shopify:${shopDomain}:${eventId}`;
// Atomic lock: only one worker can proceed
const acquired = await redisClient.set(idempotencyKey, 'locked', 'NX', 'EX', 86400);
if (!acquired) {
return { status: 'duplicate' };
}
try {
const result = await operation();
return { status: 'executed', result };
} catch (err) {
// Release lock on failure to allow retry
await redisClient.del(idempotencyKey);
throw err;
}
}
Architecture Rationale: SET NX EX provides atomic lock acquisition. Two workers racing on the same key cannot both proceed. Deleting the key on failure ensures that transient errors don’t permanently block retries. The 24-hour expiration prevents key accumulation in Redis.
Step 5: Distribute Temporal Load with Sharded Scheduling
Batch operations (reconciliation, billing, reporting) scheduled at fixed times create thundering herds. When thousands of shops trigger simultaneously, database connection pools saturate and API rate limits are exhausted within seconds.
import { CronJob } from 'cron';
import { domainProcessors } from './queues';
function computeExecutionOffset(shopId: string): number {
// Deterministic distribution across 60-minute window
const hash = Buffer.from(shopId).reduce((acc, byte) => acc + byte, 0);
return hash % 60;
}
export function scheduleReconciliation(shopId: string, shopDomain: string): void {
const offsetMinute = computeExecutionOffset(shopId);
const cronExpression = `${offsetMinute} 2 * * *`; // 2:00 AM + offset
domainProcessors.orders.add(
`reconcile:${shopId}`,
{ shopDomain, type: 'reconciliation' },
{ repeat: { cron: cronExpression }, jobId: `reconcile:${shopId}` }
);
}
Architecture Rationale: Hash-based staggering distributes jobs evenly across a time window. Shop A might run at 2:00, Shop B at 2:03, Shop C at 2:47. This eliminates connection pool saturation and smooths API rate limit consumption. The deterministic hash ensures consistent scheduling across deployments.
Pitfall Guide
1. Blocking the HTTP Thread
Explanation: Performing database writes, external API calls, or heavy JSON parsing inside the webhook handler.
Fix: Validate HMAC, enqueue to broker, return 200. Move all business logic to background workers.
2. Monolithic Retry Policies
Explanation: Using a single queue for all event types forces a compromise on retry attempts and backoff intervals.
Fix: Implement domain-specific queues with tailored retry policies. High-impact events (orders) get aggressive retries; low-impact events (notifications) get lighter policies.
3. Silent Partial State
Explanation: Multi-step workflows that fail mid-execution without compensation logic leave systems in inconsistent states.
Fix: Apply the saga pattern. Every external API call must have a corresponding rollback handler. Log compensation events for auditability.
4. Idempotency Key Collisions
Explanation: Using predictable or non-unique keys (e.g., just orderId) causes cross-shop collisions or false duplicate detection.
Fix: Use composite keys: shopDomain + eventId + operationType. Ensure atomic Redis operations (SET NX) and explicit failure cleanup.
5. Thundering Herd on Cron Jobs
Explanation: Scheduling all shops at the exact same minute saturates infrastructure and triggers rate limits.
Fix: Hash-based staggering distributes jobs across a time window. Use deterministic offsets to ensure consistency across restarts.
6. HMAC Validation Bypass
Explanation: Skipping signature verification or using weak comparison methods exposes the endpoint to spoofed payloads.
Fix: Always verify x-shopify-hmac-sha256 using crypto.timingSafeEqual. Never compare strings directly. Store secrets in environment variables, never in code.
7. Dead Letter Queue Neglect
Explanation: Failed jobs that exhaust retries disappear silently, making debugging impossible.
Fix: Route exhausted jobs to a DLQ. Implement monitoring alerts, manual replay tooling, and structured error logging. Treat DLQs as production-critical infrastructure.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume order ingestion | Tiered queue + aggressive retries | Revenue-critical; requires guaranteed delivery | Higher worker count, but prevents lost sales |
| Low-priority notifications | Single queue + light retries | Non-blocking; duplicates are acceptable | Minimal infrastructure cost |
| Multi-step fulfillment | Saga pattern with explicit rollback | Prevents partial state and inventory leaks | Moderate complexity, high reliability ROI |
| Nightly reconciliation | Hash-staggered cron jobs | Eliminates thundering herd and DB saturation | Near-zero incremental cost |
| Real-time inventory sync | Event bus + idempotency locks | Prevents overselling and race conditions | Redis overhead, but prevents stock discrepancies |
Configuration Template
// infrastructure/queue-config.ts
import { Queue, Worker } from 'bullmq';
import { redisConnection } from './redis';
export const queueConfig = {
ingestion: {
name: 'shopify:ingestion',
options: { connection: redisConnection, defaultJobOptions: { removeOnComplete: 100 } }
},
orders: {
name: 'shopify:orders',
options: {
connection: redisConnection,
defaultJobOptions: {
attempts: 5,
backoff: { type: 'exponential', delay: 120000 },
removeOnFail: { age: 86400 }
}
}
},
notifications: {
name: 'shopify:notifications',
options: {
connection: redisConnection,
defaultJobOptions: {
attempts: 2,
backoff: { type: 'fixed', delay: 15000 },
priority: 10,
removeOnComplete: 50
}
}
},
dlq: {
name: 'shopify:dlq',
options: { connection: redisConnection }
}
};
export const workers = {
orders: new Worker(queueConfig.orders.name, async (job) => {
// Process order logic with idempotency guard
}, { connection: redisConnection, concurrency: 10 }),
notifications: new Worker(queueConfig.notifications.name, async (job) => {
// Send notification logic
}, { connection: redisConnection, concurrency: 20 })
};
// Move exhausted jobs to DLQ automatically
workers.orders.on('failed', (job, err) => {
if (job.attemptsMade >= job.opts.attempts) {
queueConfig.dlq.add('failed_order', { ...job.data, error: err.message });
}
});
Quick Start Guide
- Initialize Redis & Broker: Deploy a Redis instance (managed or self-hosted). Configure
ioredis and bullmq with connection pooling and TLS if required.
- Deploy Ingestion Handler: Set up the Express/Fastify webhook endpoint. Implement HMAC verification and immediate broker enqueueing. Test with Shopify’s webhook simulator.
- Spin Up Domain Workers: Launch background workers for each queue tier. Configure concurrency limits based on downstream API rate limits and database connection pool size.
- Enable Idempotency & DLQ: Integrate the atomic lock pattern into all worker handlers. Configure failed-job routing to a monitored DLQ with alerting.
- Validate Under Load: Use a webhook replay tool to simulate burst traffic. Verify response times stay under 50ms, retries behave as configured, and no jobs are lost. Monitor Redis memory and queue depths.