t near-zero cost.
Phase 2: Payload Sanitization & Structural Validation
Once traffic passes triage, the body must be parsed safely. Multipart forms require streaming validation, not naive JSON conversion.
import { createHash } from 'crypto';
import { Readable } from 'stream';
const MAGIC_SIGNATURES: Record<string, number[]> = {
jpeg: [0xFF, 0xD8, 0xFF],
png: [0x89, 0x50, 0x4E, 0x47],
pdf: [0x25, 0x50, 0x44, 0x46],
gif: [0x47, 0x49, 0x46, 0x38]
};
function verifyFileSignature(buffer: Buffer, claimedMime: string): boolean {
const expected = MAGIC_SIGNATURES[claimedMime.split('/')[1]];
if (!expected) return false;
return expected.every((byte, index) => buffer[index] === byte);
}
interface SanitizationResult {
valid: boolean;
errors: string[];
payload: Record<string, any>;
}
export function sanitizePayload(rawBody: Record<string, any>, files: FileUpload[]): SanitizationResult {
const errors: string[] = [];
const cleaned: Record<string, any> = {};
// 1. Honeypot Detection: Invisible fields trap automated form fillers
const trapFields = ['_secondary_email', 'company_reg', 'fax_number', 'site_url'];
const trapTriggered = trapFields.some(field => rawBody[field] !== undefined && rawBody[field] !== '');
if (trapTriggered) errors.push('HONEYPOT_TRIGGERED');
// 2. Timing Validation: Human interaction requires minimum form age
const submittedAt = Number(rawBody['_ts'] || 0);
const formAge = Date.now() - submittedAt;
if (formAge < 2500) errors.push('SUBMISSION_TOO_FAST');
// 3. File Validation: Magic bytes override client-controlled MIME headers
for (const file of files) {
if (!verifyFileSignature(file.header, file.mimeType)) {
errors.push(`INVALID_SIGNATURE:${file.name}`);
}
if (file.size > 25 * 1024 * 1024) {
errors.push(`SIZE_EXCEEDED:${file.name}`);
}
}
// Strip internal tracking fields before storage
const { _ts, _ts_client, ...publicData } = rawBody;
return { valid: errors.length === 0, errors, payload: publicData };
}
interface FileUpload {
name: string;
mimeType: string;
size: number;
header: Buffer;
}
Architecture Rationale: Client-supplied Content-Type headers are untrusted. Magic byte validation reads the first 3β4 bytes to confirm actual file type. Honeypot fields combined with form-age checks catch >80% of bots without external dependencies. Size limits are enforced during streaming to prevent memory exhaustion.
Phase 3: Risk Assessment & Deduplication
Legitimate-looking submissions still require heuristic scoring and replay protection.
import { createHash, randomUUID } from 'crypto';
interface RiskProfile {
score: number;
blocked: boolean;
signals: string[];
}
export function assessRisk(
payload: Record<string, any>,
userAgent: string,
headers: Record<string, string | undefined>
): RiskProfile {
let score = 0;
const signals: string[] = [];
// Temp email detection
const email = String(payload.email || '');
const disposableDomains = ['mailinator.com', '10minutemail.com', 'guerrillamail.com', 'tempmail.dev'];
if (disposableDomains.some(domain => email.endsWith(domain))) {
score += 3;
signals.push('DISPOSABLE_EMAIL');
}
// Header completeness check
if (!headers.origin || !headers.referer) {
score += 1;
signals.push('MISSING_REFERRAL_HEADERS');
}
// Link density heuristic
const message = String(payload.message || '');
const linkMatches = message.match(/https?:\/\//g);
const linkDensity = linkMatches ? (linkMatches.length * 15) / Math.max(message.length, 1) : 0;
if (linkDensity > 0.3) {
score += 4;
signals.push('HIGH_LINK_DENSITY');
}
return { score, blocked: score >= 5, signals };
}
export async function preventReplay(
formId: string,
payload: Record<string, any>,
fileHash: string
): Promise<boolean> {
const fingerprint = createHash('sha256')
.update(`${formId}|${JSON.stringify(payload)}|${fileHash}`)
.digest('hex');
const lockKey = `dedup:${fingerprint}`;
const acquired = await redis.set(lockKey, '1', 'EX', 60, 'NX');
return acquired === 'OK';
}
Architecture Rationale: Heuristic scoring avoids blocking humans while burning bot infrastructure. The deduplication window uses a SHA256 fingerprint of form ID, payload, and file hash. Redis SET NX EX provides atomic lock acquisition. If a duplicate is detected within 60 seconds, the pipeline returns a synthetic 201 response to waste bot retry cycles without storing data.
Phase 4: Persistence & Async Dispatch
Storage and notifications must be decoupled. Database writes use explicit transactions. Notifications run asynchronously with status tracking.
import { PrismaClient } from '@prisma/client';
const db = new PrismaClient();
interface SubmissionRecord {
formId: string;
endpoint: string;
data: Record<string, any>;
fileUrls: string[];
metadata: {
ip: string;
userAgent: string;
riskScore: number;
riskSignals: string[];
};
}
export async function persistSubmission(record: SubmissionRecord): Promise<string> {
const result = await db.submission.create({
data: {
formId: record.formId,
endpoint: record.endpoint,
payload: record.data,
attachments: record.fileUrls,
meta: record.metadata,
status: 'PENDING_NOTIFICATION'
},
select: { id: true }
});
return result.id;
}
// Async notification dispatcher (fire-and-forget with observability)
export async function dispatchNotifications(submissionId: string, payload: Record<string, any>) {
try {
await emailProvider.send({
to: process.env.NOTIFICATION_EMAIL,
template: 'form_submission',
variables: { submissionId, ...payload }
});
await db.submission.update({
where: { id: submissionId },
data: { status: 'NOTIFIED' }
});
} catch (err) {
logger.error({ submissionId, err }, 'Notification dispatch failed');
await db.submission.update({
where: { id: submissionId },
data: { status: 'NOTIFICATION_FAILED' }
});
}
}
Architecture Rationale: The database insert is the commitment point. Metadata columns store risk scores and signals for post-incident analysis. Email dispatch runs after the HTTP response is sent. Status tracking (PENDING_NOTIFICATION β NOTIFIED / NOTIFICATION_FAILED) eliminates silent failures. A retry queue should replace direct await calls at scale.
Phase 5: Secure Webhook Fanout
External integrations require cryptographic verification to prevent spoofing.
import { createHmac } from 'crypto';
interface WebhookPayload {
event: 'form.submitted';
formId: string;
submissionId: string;
timestamp: number;
}
export async function deliverWebhook(
targetUrl: string,
secret: string,
payload: WebhookPayload
): Promise<void> {
const bodyString = JSON.stringify(payload);
const signature = createHmac('sha256', secret)
.update(`${payload.timestamp}.${bodyString}`)
.digest('hex');
await fetch(targetUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Webhook-Timestamp': String(payload.timestamp),
'X-Webhook-Signature': signature
},
body: bodyString,
signal: AbortSignal.timeout(5000)
});
}
Architecture Rationale: Webhooks execute asynchronously with explicit timeouts. The HMAC-SHA256 signature binds the timestamp and body to a shared secret. Receivers must validate the signature before processing. This prevents replay attacks and unauthorized payload injection.
Pitfall Guide
1. Blocking the Hot Path with External APIs
Explanation: Awaiting CAPTCHA verification or email dispatch synchronously adds 200β800ms per request. Provider outages directly degrade form availability.
Fix: Move all external network calls to async workers or message queues. Return 200 immediately after database persistence. Implement circuit breakers for email/webhook providers.
2. Trusting Client-Supplied MIME Types
Explanation: Bots upload executable scripts with Content-Type: image/jpeg. Relying on headers allows arbitrary code execution or storage corruption.
Fix: Validate magic bytes from the first 4β8 bytes of the stream. Maintain an allowlist of verified signatures. Reject mismatches before writing to disk.
3. Ignoring Temporary File Lifecycle
Explanation: Streaming parsers write to /tmp or memory buffers. Missing cleanup logic causes disk exhaustion within days of production traffic.
Fix: Wrap parser streams in try/finally blocks. Explicitly unlink temporary files. Monitor disk usage with alerts at 70% capacity.
4. Race Conditions on Business Limits
Explanation: Concurrent submissions checking a max_submissions count can both pass validation before either writes, exceeding the limit.
Fix: Enforce limits inside the database using SELECT ... FOR UPDATE or atomic counters. Use database triggers or optimistic locking to serialize limit checks.
5. Silent Async Failures Without Status Tracking
Explanation: Fire-and-forget email dispatches fail silently when providers rate-limit or reject templates. No alerting means missing submissions go unnoticed.
Fix: Add a notification_status column. Log failures with structured metadata. Implement a dead-letter queue for retry. Review failed statuses weekly.
6. Over-Reliance on Honeypots Without Timing
Explanation: Advanced bots parse CSS and skip hidden fields. Honeypots alone miss sophisticated automation.
Fix: Combine honeypots with form-age validation. Require a minimum 2.5β3 second interval between page load and submission. Reject sub-second interactions.
7. Hardcoded Rate Limit Keys
Explanation: Limiting by IP alone blocks shared networks (corporate NAT, mobile carriers). Limiting by endpoint alone allows single-IP exhaustion.
Fix: Use composite keys (IP:Endpoint). Implement sliding windows for burst protection. Adjust thresholds based on traffic patterns, not arbitrary numbers.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-volume lead capture | Async pipeline + Redis dedup | Prevents duplicate storage, reduces hot path latency | Low (Redis is cheap, saves DB compute) |
| File-heavy submissions | Streaming parser + magic bytes + temp cleanup | Prevents disk exhaustion and malicious uploads | Medium (storage costs, but avoids data loss) |
| Enterprise compliance | CAPTCHA gating + HMAC webhooks + audit metadata | Meets security requirements, enables forensic tracing | High (CAPTCHA costs, but reduces fraud) |
| Low-traffic internal forms | Sync handler + basic honeypot | Simplicity outweighs pipeline complexity | Minimal (no external dependencies) |
Configuration Template
// pipeline.config.ts
export const formPipelineConfig = {
rateLimit: {
windowSeconds: 60,
maxRequests: 5,
keyPattern: 'rl:{ip}:{endpoint}'
},
fileUpload: {
maxSizeBytes: 25 * 1024 * 1024,
allowedSignatures: ['jpeg', 'png', 'pdf', 'gif'],
tempDir: '/var/tmp/form-uploads',
cleanupOnExit: true
},
antiAbuse: {
honeypotFields: ['_secondary_email', 'company_reg', 'fax_number'],
minFormAgeMs: 2500,
captcha: {
enabled: false,
provider: 'cloudflare_turnstile',
timeoutMs: 1500,
failStrategy: 'open' // 'open' or 'closed'
},
spamThreshold: 5,
dedupWindowSeconds: 60
},
notifications: {
email: {
provider: 'resend',
async: true,
retryAttempts: 3,
statusTracking: true
},
webhooks: {
enabled: true,
signatureAlgorithm: 'HMAC-SHA256',
timeoutMs: 5000
}
}
};
Quick Start Guide
- Initialize the pipeline: Install
ioredis, @prisma/client, and configure your database schema with status and meta columns.
- Deploy the triage middleware: Attach the rate limit and automation fingerprint checks to your router before body parsing.
- Configure file validation: Set up a streaming parser with magic byte verification and enforce size limits in your upload handler.
- Wire async dispatchers: Replace synchronous email/webhook calls with queue-based workers. Add status tracking to your submission table.
- Monitor and iterate: Track
notification_status failures, review spam scores weekly, and adjust thresholds based on actual traffic patterns.