he Retry Decorator
The retry mechanism must execute the target function up to N additional times upon failure, preserve the final error, and avoid masking the root cause.
type AsyncFn<TArgs extends unknown[], TReturn> = (...args: TArgs) => Promise<TReturn>;
export function applyRetry<TArgs extends unknown[], TReturn>(
maxRetries: number,
target: AsyncFn<TArgs, TReturn>
): AsyncFn<TArgs, TReturn> {
return async function (...args: TArgs): Promise<TReturn> {
let executionCount = 0;
let finalError: Error | undefined;
while (executionCount <= maxRetries) {
try {
return await target(...args);
} catch (err) {
finalError = err instanceof Error ? err : new Error(String(err));
executionCount++;
}
}
throw finalError!;
};
}
Architecture Rationale:
executionCount <= maxRetries ensures the function runs maxRetries + 1 times total. This aligns with natural language expectations: "retry 3 times" means 1 initial attempt + 3 reattempts.
await inside the try block is mandatory. Without it, the promise rejection escapes the synchronous boundary of the try/catch, resulting in an unhandled rejection.
- The decorator captures and rethrows
finalError instead of generating a generic wrapper. This preserves stack traces, error codes, and HTTP status details for downstream logging or alerting systems.
- TypeScript generics (
TArgs, TReturn) ensure type safety across arbitrary function signatures without runtime overhead.
Step 2: Define the Deadline Decorator
The deadline mechanism enforces a strict time boundary using promise racing. It must interrupt hanging operations without leaking resources.
export function applyDeadline<TArgs extends unknown[], TReturn>(
timeoutMs: number,
target: AsyncFn<TArgs, TReturn>
): AsyncFn<TArgs, TReturn> {
return async function (...args: TArgs): Promise<TReturn> {
let timerId: ReturnType<typeof setTimeout> | undefined;
const deadlinePromise = new Promise<never>((_, reject) => {
timerId = setTimeout(() => {
reject(new Error('Operation exceeded deadline'));
}, timeoutMs);
});
try {
const result = await Promise.race([target(...args), deadlinePromise]);
return result as TReturn;
} finally {
if (timerId !== undefined) clearTimeout(timerId);
}
};
}
Architecture Rationale:
Promise.race settles with whichever promise resolves or rejects first. The target function and the deadline timer compete directly.
- The timer promise is typed as
Promise<never> and only rejects. This prevents accidental resolution with undefined or a stale value.
- The
finally block clears the timer regardless of outcome. Without this, the timer remains scheduled in the event loop, causing memory leaks and potential delayed rejections in long-running processes.
- Type assertion (
as TReturn) is safe because deadlinePromise never resolves, meaning the race can only settle with the target's actual return type.
Step 3: Compose the Policies
Both decorators return functions with identical signatures to their input. This enables clean composition without coupling.
const fetchUserProfile = async (userId: string) => {
const res = await fetch(`/api/v2/users/${userId}`);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
};
const resilientFetch = applyRetry(3, applyDeadline(5000, fetchUserProfile));
// Usage remains identical to the original function
const profile = await resilientFetch('usr_8842');
Execution Flow:
applyDeadline wraps fetchUserProfile with a 5-second race condition.
applyRetry wraps the deadline-wrapped function with a 3-reattempt policy.
- Each attempt operates under the 5-second ceiling. If the deadline triggers, the rejection bubbles to the retry decorator.
- The retry decorator catches the deadline error, increments the counter, and schedules the next attempt.
- After 4 total executions (1 initial + 3 retries), the final deadline error is thrown to the caller.
- Maximum wall-clock time: 20 seconds. Resource consumption is bounded by the deadline, and failure is explicit.
Pitfall Guide
1. Omitting await Inside try/catch
Explanation: Returning a promise directly from a try block bypasses synchronous error catching. Rejected promises escape the boundary and trigger unhandled rejection warnings.
Fix: Always await the target function inside the try block to convert asynchronous rejections into catchable synchronous exceptions.
2. Timer Memory Leaks
Explanation: setTimeout schedules a callback in the event loop. If the target function resolves before the deadline, the timer remains active. In long-running servers, this accumulates scheduled callbacks, increasing memory pressure and delaying process shutdown.
Fix: Store the timer ID and call clearTimeout in a finally block to guarantee cleanup regardless of success or failure.
3. Ignoring Idempotency During Retries
Explanation: Retrying non-idempotent operations (e.g., POST /charge, PUT /update-quantity) can cause duplicate side effects, financial discrepancies, or data corruption.
Fix: Apply retry policies only to idempotent operations (GET, HEAD, OPTIONS, idempotent PUT/DELETE). For state-mutating endpoints, implement explicit compensation logic or use distributed transaction patterns instead of blind retries.
4. Incorrect Composition Order
Explanation: applyRetry(applyDeadline(fn)) applies the deadline to each attempt. applyDeadline(applyRetry(fn)) applies a single deadline to the entire retry sequence. The latter can cause the deadline to expire while retries are still pending, leaving the caller hanging.
Fix: Always wrap the deadline around the target first, then wrap retry around the deadline. This ensures each attempt respects the SLA independently.
5. Swapping Original Errors for Generic Wrappers
Explanation: Throwing new Error('Max retries reached') discards HTTP status codes, database error codes, and stack traces. Debugging becomes guesswork.
Fix: Capture the last caught error and rethrow it. Preserve the original error type and message to maintain observability and enable precise alerting rules.
6. Hardcoding Policy Values
Explanation: Embedding retry counts and timeouts directly in decorator calls creates configuration drift. Changing a policy requires code deployment rather than environment-driven adjustment.
Fix: Extract policy parameters into a configuration object or environment variables. Inject them at initialization time to support per-service tuning without code changes.
7. Missing Backoff Strategy
Explanation: Immediate retries amplify load on struggling downstream services. If a database is overwhelmed, 4 rapid retries will likely fail identically and worsen the outage.
Fix: Introduce exponential backoff with jitter between attempts. This gives downstream systems time to recover and reduces thundering herd effects.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Idempotent GET/HEAD requests | applyRetry(3, applyDeadline(5000, fn)) | Safe to repeat, bounded latency improves UX | Low (minimal compute overhead) |
| Non-idempotent POST/PUT mutations | applyDeadline(8000, fn) only | Prevents duplicate side effects while enforcing SLA | Medium (requires compensation logic on failure) |
| High-throughput internal service mesh | applyRetry(2, applyDeadline(2000, fn)) with backoff | Reduces cross-service latency spikes during partial outages | Low (improves overall throughput) |
| External third-party API with rate limits | applyRetry(1, applyDeadline(10000, fn)) + circuit breaker | Avoids exhausting quota, respects provider SLAs | High (prevents account suspension/overage fees) |
| Batch processing / ETL pipelines | applyRetry(5, applyDeadline(30000, fn)) with exponential backoff | Tolerates transient cloud storage/network blips | Medium (increases job duration but improves success rate) |
Configuration Template
// resilience.config.ts
import { applyRetry, applyDeadline } from './decorators';
export interface ResiliencePolicy {
maxRetries: number;
deadlineMs: number;
backoffBaseMs?: number;
}
export const defaultPolicy: ResiliencePolicy = {
maxRetries: 3,
deadlineMs: 5000,
backoffBaseMs: 200,
};
export const strictPolicy: ResiliencePolicy = {
maxRetries: 1,
deadlineMs: 3000,
backoffBaseMs: 100,
};
export const bulkPolicy: ResiliencePolicy = {
maxRetries: 5,
deadlineMs: 15000,
backoffBaseMs: 500,
};
// Usage factory
export function createResilientCaller<TArgs extends unknown[], TReturn>(
target: (...args: TArgs) => Promise<TReturn>,
policy: ResiliencePolicy = defaultPolicy
) {
const withDeadline = applyDeadline(policy.deadlineMs, target);
return applyRetry(policy.maxRetries, withDeadline);
}
Quick Start Guide
- Install/Import: Copy the
applyRetry and applyDeadline implementations into a shared utilities module. Ensure TypeScript is configured for strict mode to catch generic mismatches early.
- Identify Targets: Locate async functions that perform network I/O, database queries, or external service calls. Exclude mutation endpoints that lack idempotency guarantees.
- Wrap & Configure: Replace direct calls with
createResilientCaller(targetFunction, policy). Adjust maxRetries and deadlineMs based on downstream SLA documentation.
- Validate Error Flow: Trigger controlled failures (e.g., mock timeouts, 503 responses) and verify that original error messages and stack traces reach your logging layer without transformation.
- Deploy & Monitor: Roll out to staging, observe retry attempt metrics in your APM dashboard, and tune policy values based on actual latency distributions and failure rates.