d idempotency (event_id as PK) completely prevents duplicate processing without Redis complexity.
- State machine validation automatically drops illegal transitions, preserving payment integrity during out-of-order delivery.
Sweet Spot: ~50ms HTTP acknowledgment + async message queue + strict state transition validation + DB-backed idempotency. This combination achieves zero missed events over 14 months of production open banking traffic.
Core Solution
The production-tested architecture enforces four non-negotiable layers. Each layer addresses a specific failure mode in distributed webhook delivery.
1. Verify the signature before you parse the body
Parsing JSON before HMAC verification mutates whitespace, breaks signature validation, and exposes the system to spoofed payloads. Always use the raw request body.
// webhook.controller.ts
@Post('atoa')
async handle(
@Headers('x-atoa-signature') signature: string,
@RawBody() body: Buffer, // raw, not parsed
) {
if (!this.crypto.verify(body, signature, this.secret)) {
throw new UnauthorizedException();
}
const event = JSON.parse(body.toString());
await this.queue.enqueue(event);
return { received: true };
}
Two non-negotiables:
- Use the raw body for HMAC verification. NestJS's default JSON parser will mutate whitespace and break your signature check. Enable
rawBody: true on the app.
- Reject before you do anything else. No DB hits, no logging the payload at info level, nothing.
2. Acknowledge fast. Process slow.
The webhook controller's sole responsibility is verification and enqueuing. Business logic must run asynchronously.
async handle(...) {
// verify (above)
await this.queue.enqueue('payment.webhook', event);
return { received: true }; // 200 within ~50ms
}
If your handler takes 8 seconds because you're hitting Stripe + your DB + sending an email, the sender will time out and retry. Now you have two events. Then four. Then the on-call engineer.
We use BullMQ on Redis. You can use SQS, NATS, Kafka β pick your poison. The point is: the HTTP response is decoupled from the work.
3. Idempotency keys are not optional
Every event carries an event_id. Before executing business logic, enforce first-seen semantics at the database layer.
@Processor('payment.webhook')
export class WebhookProcessor {
async process(job: Job<WebhookEvent>) {
const { event_id, payment_id, status } = job.data;
const seen = await this.events.firstSeen(event_id);
if (!seen) {
this.logger.log(`Duplicate event ${event_id} β skipping`);
return;
}
await this.applyStatus(payment_id, status, event_id);
}
}
firstSeen is a write to a Postgres table with event_id as the primary key. If the insert succeeds, this is the first time we've seen this event. If it conflicts, we've processed it before. No race conditions, no Redis dance β just let the database do the work it's good at.
4. State machines, not status updates
Payments require deterministic state transitions, not flat status fields. Invalid transitions must be dropped, not thrown.
const ALLOWED: Record<PaymentStatus, PaymentStatus[]> = {
initiated: ['authorising', 'failed'],
authorising: ['succeeded', 'failed'],
succeeded: [], // terminal
failed: [], // terminal
};
async applyStatus(id: string, next: PaymentStatus, eventId: string) {
const payment = await this.payments.findById(id);
if (!ALLOWED[payment.status].includes(next)) {
this.logger.warn(`Illegal transition: ${payment.status} β ${next}`);
return; // do not update, do not throw β this is normal
}
await this.payments.transition(id, next, eventId);
}
Why this matters: when failed arrives before pending (and it will), your code shouldn't downgrade a succeeded payment to failed. With a state machine, the invalid transition is dropped. The reconciler picks it up later. The customer's payment stays correct.
Pitfall Guide
- Parsing JSON Before HMAC Verification: Mutates whitespace/encoding, breaks signature validation, and exposes the endpoint to spoofed payloads. Always verify against the raw byte buffer.
- Synchronous Processing in Webhook Handlers: Ties HTTP lifecycle to business logic, causing timeouts, provider retries, and duplicate event storms. Acknowledge in <100ms, process asynchronously.
- Treating Idempotency as Optional: Without database-enforced
event_id constraints, retries cause double-processing, financial discrepancies, and reconciliation debt.
- Using Flat Status Fields Instead of State Machines: Allows illegal transitions (e.g.,
succeeded β failed), corrupts payment state, and requires manual intervention to fix.
- Logging Full Payloads at Info Level: Violates PSD2/GDPR compliance, exposes PII, increases storage costs, and creates audit liabilities. Log only
event_id and status.
- Polling as a Webhook Replacement: Burns provider rate limits, misses real-time customer-facing windows, and wastes compute resources on 99% idle checks.
- Replaying Webhooks Without Idempotency Guards: Re-executes side effects (emails, DB writes, external API calls), causing cascading failures and data corruption. Idempotency keys make replays free.
Deliverables
- π NestJS Webhook Architecture Blueprint: Complete reference architecture covering raw body configuration, BullMQ queue topology, Postgres idempotency schema, and state machine validation layer. Includes deployment topology for high-availability webhook ingestion.
- β
Pre-Deployment Webhook Validation Checklist: 12-point verification matrix covering signature verification order, ack latency targets (<100ms), idempotency constraint enforcement, state transition rule coverage, PII logging policy, and retry storm simulation tests.
- βοΈ Configuration Templates: Production-ready NestJS modules including
app.module.ts (raw body + compression settings), webhook.processor.ts (BullMQ worker + DB idempotency), and payment.state.ts (typed state machine constants + transition guards). Ready for direct integration into existing codebases.