Request payload failed validation', 422, 'VALIDATION_FAILURE');
this.violations = violations;
}
}
export class AuthFailure extends BaseDomainError {
constructor(reason: string = 'Invalid credentials') {
super(reason, 401, 'AUTHENTICATION_DENIED');
}
}
export class RateThresholdExceeded extends BaseDomainError {
public readonly retryWindow: number;
constructor(seconds: number = 60) {
super('Request quota exhausted', 429, 'RATE_LIMIT_REACHED');
this.retryWindow = seconds;
}
}
**Why this works:** By centralizing HTTP mapping and error codes, you eliminate magic strings scattered across route handlers. The `isOperational` flag becomes the routing key for downstream middleware, allowing the system to treat expected failures differently from programming bugs.
### Step 2: Route-Level Async Guard
Express and similar frameworks do not automatically propagate promise rejections to error middleware. A higher-order wrapper bridges this gap without cluttering route logic.
```typescript
import { Request, Response, NextFunction } from 'express';
type RouteHandler = (req: Request, res: Response, next: NextFunction) => Promise<void>;
export const wrapAsync = (handler: RouteHandler) => {
return (req: Request, res: Response, next: NextFunction) => {
Promise.resolve(handler(req, res, next)).catch(next);
};
};
Why this works: It removes boilerplate try/catch blocks from every route. The wrapper guarantees that any rejected promise flows directly into Express's error pipeline, maintaining a clean separation between business logic and failure routing.
Step 3: Centralized Error Dispatcher
The final middleware in the stack acts as the single point of truth for error transformation. It must respect environment boundaries and never leak internal state to clients.
import { Request, Response, NextFunction } from 'express';
import { BaseDomainError } from './errors';
export const errorDispatcher = (
err: Error,
req: Request,
res: Response,
_next: NextFunction
) => {
const isDev = process.env.NODE_ENV === 'development';
const correlationId = req.headers['x-correlation-id'] as string || 'unknown';
if (err instanceof BaseDomainError) {
const payload: Record<string, unknown> = {
error: err.message,
code: err.errorCode,
correlationId,
};
if ('violations' in err) payload.violations = err.violations;
if ('retryWindow' in err) payload.retryAfter = err.retryWindow;
return res.status(err.httpStatus).json(payload);
}
// Programming errors: log internally, return safe fallback
console.error(`[FATAL] ${err.message} | Correlation: ${correlationId}`, err.stack);
return res.status(500).json({
error: isDev ? err.message : 'Service temporarily unavailable',
code: 'SYSTEM_FAILURE',
correlationId,
});
};
Why this works: It enforces a strict contract: operational errors return predictable shapes with client-safe messages; programming errors trigger internal logging and return generic fallbacks. The correlation ID enables distributed tracing across microservices.
Step 4: Process-Level Safety Net
Unhandled rejections and uncaught exceptions bypass Express middleware entirely. They must be intercepted at the Node.js process level.
import { Server } from 'http';
export const attachProcessShields = (server: Server) => {
process.on('unhandledRejection', (reason) => {
console.error('[SHIELD] Unhandled promise rejection:', reason);
// Forward to external telemetry (e.g., Sentry, Datadog)
});
process.on('uncaughtException', (err) => {
console.error('[SHIELD] Uncaught exception:', err);
initiateGracefulExit(server, err);
});
};
const initiateGracefulExit = (server: Server, cause: Error) => {
console.warn(`[SHUTDOWN] Initiating exit due to: ${cause.message}`);
server.close(() => {
console.log('[SHUTDOWN] HTTP listeners drained');
process.exit(1);
});
// Fallback force-quit to prevent zombie processes
setTimeout(() => {
console.error('[SHUTDOWN] Force termination after timeout');
process.exit(1);
}, 10_000);
};
Why this works: unhandledRejection allows the process to continue running while logging the anomaly. uncaughtException indicates a corrupted state; the system drains active connections, stops accepting new traffic, and exits cleanly. The timeout prevents indefinite hangs during shutdown.
Step 5: Structured Telemetry Pipeline
Console logging is insufficient for production. Errors must be serialized into machine-readable formats with request context.
export const emitErrorTelemetry = (err: Error, context: { path: string; method: string; userId?: string }) => {
const record = {
timestamp: new Date().toISOString(),
level: err instanceof BaseDomainError && err.httpStatus < 500 ? 'warn' : 'error',
message: err.message,
code: err instanceof BaseDomainError ? err.errorCode : 'UNKNOWN',
path: context.path,
method: context.method,
userId: context.userId || null,
isOperational: err instanceof BaseDomainError ? err.isOperational : false,
};
process.stdout.write(JSON.stringify(record) + '\n');
// Route to external monitoring for critical failures
if (record.level === 'error') {
// telemetryService.capture(err, { extra: context });
}
};
Why this works: Structured JSON logs integrate natively with log aggregators (ELK, Loki, CloudWatch). Level routing ensures that 4xx errors don't trigger PagerDuty alerts, while 5xx failures automatically escalate. Context enrichment enables filtering by user, endpoint, or operation type.
Pitfall Guide
1. Silent Catch Blocks
Explanation: Empty catch statements swallow failures, leaving the system in an inconsistent state with zero visibility.
Fix: Always log or re-throw. If suppression is intentional, document the business rationale and emit a warning-level telemetry event.
2. Leaking Internal State
Explanation: Returning err.stack, database query strings, or internal variable names in production responses exposes attack surfaces and implementation details.
Fix: Implement environment-aware response masking. Strip stack traces and internal metadata before serialization. Use correlation IDs instead of raw error details.
3. Treating All Errors Equally
Explanation: Mapping every failure to 500 Internal Server Error destroys HTTP semantics and prevents clients from implementing retry logic or user-friendly fallbacks.
Fix: Map errors to precise HTTP status codes (400, 401, 403, 404, 409, 422, 429, 500). Use error codes for machine parsing and messages for human readability.
4. Ignoring the Event Loop Boundary
Explanation: Relying solely on Express middleware leaves process-level failures unhandled. A single unhandled rejection in a background job can crash the entire server.
Fix: Attach unhandledRejection and uncaughtException listeners at application bootstrap. Route them to telemetry and implement graceful shutdown for fatal exceptions.
5. Context-Starved Logging
Explanation: Logging only err.message strips away the request context needed to reproduce issues. Engineers waste hours reconstructing what triggered the failure.
Fix: Enrich every error log with correlationId, path, method, userId, and timestamp. Use structured JSON to enable automated querying and alerting.
6. Over-Nesting Try/Catch in Routes
Explanation: Wrapping every database call in try/catch creates verbose, hard-to-maintain route handlers that mix business logic with error plumbing.
Fix: Use async route wrappers to delegate error propagation to centralized middleware. Reserve try/catch for specific recovery logic (e.g., fallback caching, transaction rollbacks).
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput public API | Layered architecture + async wrappers + correlation IDs | Prevents cascading failures, enables distributed tracing, reduces MTTR | Low infrastructure cost, high engineering ROI |
| Internal microservice | Central dispatcher + structured logging + process shields | Simplifies cross-service error routing, maintains consistent contracts | Minimal overhead, improves team velocity |
| Legacy monolith migration | Gradual adoption: start with process listeners, then central middleware | Avoids breaking existing routes while hardening failure boundaries | Phased rollout reduces risk, moderate initial effort |
Configuration Template
// src/infrastructure/error-architecture.ts
import { Express } from 'express';
import { errorDispatcher } from './middleware/error-dispatcher';
import { attachProcessShields } from './shields/process-shield';
import { Server } from 'http';
export const initializeErrorArchitecture = (app: Express, server: Server) => {
// Register centralized error handler as the last middleware
app.use(errorDispatcher);
// Attach process-level safety nets
attachProcessShields(server);
console.log('[ARCH] Error handling pipeline initialized');
};
// src/middleware/error-dispatcher.ts
import { Request, Response, NextFunction } from 'express';
import { BaseDomainError } from '../errors/base-domain-error';
import { emitErrorTelemetry } from '../telemetry/error-logger';
export const errorDispatcher = (err: Error, req: Request, res: Response, _next: NextFunction) => {
const isDev = process.env.NODE_ENV === 'development';
const correlationId = (req.headers['x-correlation-id'] as string) || crypto.randomUUID();
// Emit telemetry before responding
emitErrorTelemetry(err, {
path: req.originalUrl,
method: req.method,
userId: (req as any).user?.id,
});
if (err instanceof BaseDomainError) {
const payload: Record<string, unknown> = {
error: err.message,
code: err.errorCode,
correlationId,
};
if ('violations' in err) payload.violations = err.violations;
if ('retryWindow' in err) payload.retryAfter = err.retryWindow;
return res.status(err.httpStatus).json(payload);
}
console.error(`[FATAL] ${err.message} | Correlation: ${correlationId}`, err.stack);
return res.status(500).json({
error: isDev ? err.message : 'Service temporarily unavailable',
code: 'SYSTEM_FAILURE',
correlationId,
});
};
Quick Start Guide
- Create the base error class: Define
BaseDomainError with httpStatus, errorCode, and isOperational. Extend it for domain-specific failures like ResourceMissing or ValidationBreach.
- Wrap async routes: Replace inline
try/catch blocks with the wrapAsync higher-order function. This guarantees promise rejections flow to the error pipeline.
- Register the dispatcher: Add
errorDispatcher as the final app.use() call. Ensure it checks instanceof BaseDomainError to route operational vs programming errors correctly.
- Bootstrap process shields: Call
attachProcessShields(server) immediately after server.listen(). This catches unhandled rejections and triggers graceful shutdowns for fatal exceptions.
- Validate with telemetry: Trigger a test failure and verify that structured JSON logs appear in stdout, correlation IDs propagate to responses, and stack traces remain hidden in production mode.