res a layered approach: robust queue selection, idempotent worker design, concurrency control, and lag-driven auto-scaling.
1. Queue Architecture Selection
Choose the transport based on ordering guarantees and throughput requirements.
- Redis-based (BullMQ, Sidekiq): Best for general-purpose jobs, rich job metadata, and priority queues. Requires Redis persistence for durability.
- SQS/RabbitMQ: Ideal for high durability, exactly-once processing (with constraints), and enterprise integration.
- Kafka: Mandatory for streaming data, replayability, and massive throughput where jobs are events rather than discrete tasks.
2. Worker Implementation (TypeScript / BullMQ)
The worker must handle concurrency, idempotency, and graceful shutdown. BullMQ is selected for its Lua-script-based atomicity and Redis persistence.
import { Worker, Queue, Job } from 'bullmq';
import IORedis from 'ioredis';
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
const connection = new IORedis({ maxRetriesPerRequest: null });
// Queue definition with retry strategy
const emailQueue = new Queue('email-notifications', {
connection,
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 2000,
},
removeOnComplete: 100,
removeOnFail: false, // Retain failed jobs for inspection
},
});
// Worker with concurrency and idempotency
export const emailWorker = new Worker(
'email-notifications',
async (job: Job) => {
const { userId, templateId, payload } = job.data;
const idempotencyKey = `${job.id}:${job.attemptsMade}`;
// 1. Idempotency Check
// Prevent duplicate processing if the job is retried or requeued
const processed = await prisma.jobExecution.findUnique({
where: { idempotencyKey },
});
if (processed) {
console.log(`Job ${job.id} already processed. Skipping.`);
return processed.result;
}
try {
// 2. Business Logic
const result = await sendEmail(userId, templateId, payload);
// 3. Record Success Atomically
await prisma.jobExecution.create({
data: {
idempotencyKey,
jobId: job.id,
status: 'COMPLETED',
result: JSON.stringify(result),
},
});
return result;
} catch (error) {
// 4. Error Handling
// BullMQ will retry based on configuration.
// If error is due to downstream rate limit, throw specific error.
if (isRateLimitError(error)) {
throw new Error('Rate limited');
}
throw error;
}
},
{
connection,
concurrency: 20, // Tune based on downstream dependency limits
lockDuration: 30000, // 30s lock, auto-renews if job takes longer
limiter: {
max: 20,
duration: 1000, // Rate limit: 20 jobs per second
},
}
);
// Graceful Shutdown
process.on('SIGTERM', async () => {
console.log('Shutting down worker...');
await emailWorker.close();
await connection.quit();
process.exit(0);
});
3. Auto-Scaling Strategy
Horizontal Pod Autoscaler (HPA) in Kubernetes should scale based on queue lag, not CPU. CPU is a lagging indicator; queue depth is the leading indicator of capacity shortage.
Architecture Decision: Use a custom metrics adapter (e.g., Prometheus Adapter) to expose queue length as a Kubernetes metric. Configure the HPA to target a specific queue length per worker replica.
- Target:
queue_length / replicas <= 50.
- Behavior: If queue grows to 500 and target is 50, HPA scales to 10 replicas.
- Stabilization: Set
stabilizationWindowSeconds to 60 to prevent flapping during transient spikes.
4. Backpressure and Circuit Breaking
Workers must not overwhelm downstream services. Implement circuit breakers for database and API calls. If a downstream service degrades, the circuit opens, and jobs fail fast, allowing the queue to hold them rather than consuming worker threads on doomed requests.
import CircuitBreaker from 'opossum';
const emailBreaker = new CircuitBreaker(sendEmail, {
timeout: 5000,
errorThresholdPercentage: 50,
resetTimeout: 30000,
});
// Usage in worker
const result = await emailBreaker.fire(userId, templateId, payload);
Pitfall Guide
1. Missing Idempotency Guarantees
Mistake: Assuming retries are safe without deduplication.
Impact: Duplicate charges, duplicate emails, or data corruption.
Fix: Implement idempotency keys stored in a database or cache. Check existence before processing and write completion atomically.
2. Scaling on CPU Instead of Lag
Mistake: Configuring HPA based on CPU utilization.
Impact: Latency spikes occur before scaling triggers. Workers may be idle waiting on I/O while the queue grows.
Fix: Always scale on queue depth/age. CPU scaling is only useful for CPU-bound transcoding jobs.
3. Unbounded Memory Leaks
Mistake: Long-running worker processes accumulating state.
Impact: OOMKilled pods, restart storms, and job loss if not persisted.
Fix: Monitor memory usage. Implement periodic worker recycling or use frameworks that manage process lifecycles. Avoid global mutable state.
4. Thundering Herd on Database
Mistake: All workers hitting the database simultaneously upon startup or spike.
Impact: Database connection pool exhaustion and lock contention.
Fix: Implement connection pooling with limits. Add jitter to retry delays. Use batch processing where possible.
5. Ignoring Dead Letter Queues (DLQ)
Mistake: Failing jobs are discarded or retried indefinitely.
Impact: Poison pills block worker threads; critical errors go unnoticed.
Fix: Route jobs exceeding max retries to a DLQ. Set up alerts for DLQ growth. Implement a replay mechanism for DLQ jobs after fixing root causes.
6. Blocking the Event Loop
Mistake: Running synchronous heavy computation in Node.js workers.
Impact: All concurrent jobs stall; latency spikes.
Fix: Offload CPU-bound work to worker threads or external services. Keep the main thread asynchronous.
7. Lack of Job Age Visibility
Mistake: Monitoring only job success/failure rates.
Impact: Silent queue buildup goes undetected until SLA breach.
Fix: Expose metrics for job_age_p99 and queue_length. Alert on job age exceeding thresholds, not just queue length.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Volume, No Ordering | Kafka + Consumer Groups | Massive throughput, replayability, partitioned scaling. | High infrastructure cost, optimized per-throughput. |
| Priority Jobs, Mixed Types | BullMQ with Priority Queues | Redis supports priority levels; rich job metadata. | Low cost; Redis memory scales with job count. |
| Strict Ordering Required | SQS FIFO or Kafka Partitioned | Guarantees order within a group/partition. | Medium cost; throughput limited by partition count. |
| Bursty Traffic, Variable Load | Dynamic Lag-Based HPA | Scales instantly to spikes, shrinks to save cost. | Lowest cost; pays only for active processing. |
| CPU-Intensive Tasks | Dedicated Worker Nodes + CPU HPA | Isolates heavy compute from I/O bound workers. | High compute cost; requires node pooling. |
Configuration Template
Kubernetes HPA with Prometheus Adapter (Queue Lag Scaling)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: background-worker
minReplicas: 2
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: bullmq_queue_length
target:
type: AverageValue
averageValue: "50" # Scale to maintain 50 jobs per replica
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
Note: Ensure Prometheus Adapter is configured to scrape bullmq_queue_length from your metrics endpoint.
Quick Start Guide
- Initialize Queue & Worker: Install
bullmq and ioredis. Create a Queue instance with retry options and a Worker with concurrency and idempotency logic.
- Add Metrics: Instrument the worker to export
queue_length, jobs_completed_total, and job_duration_seconds to Prometheus via prom-client.
- Deploy to K8s: Containerize the worker. Deploy with a base replica count. Apply the HPA manifest configured for queue lag scaling.
- Verify Scaling: Push a burst of jobs to the queue. Monitor HPA logs and worker replica count. Confirm replicas scale up as queue depth exceeds
target * minReplicas.
- Implement DLQ: Add a listener for
failed events in the worker. On failure, push the job payload to a separate DLQ queue or database table. Set up a dashboard alert for DLQ depth > 0.