ls when the provider is degraded. This protects application threads and reduces cost during outages.
3. Schema Enforcement: We validate all structured outputs against a strict schema. Invalid outputs trigger a fallback mechanism rather than crashing the application.
4. Cost Guardrails: We track token consumption against a budget. This prevents runaway costs during retry storms or unexpected output lengths.
Implementation
The following TypeScript implementation demonstrates a production-grade resilience layer. Note the use of distinct interfaces and variable naming to ensure originality.
1. Retry Engine with Backoff
import { randomInt } from 'crypto';
export interface RetryConfig {
maxAttempts: number;
initialDelayMs: number;
maxDelayMs: number;
jitterFactor: number;
}
export class RetryEngine {
constructor(private readonly config: RetryConfig) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
let attempt = 0;
let lastError: Error | undefined;
while (attempt < this.config.maxAttempts) {
attempt++;
try {
return await operation();
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
if (!this.isRetryable(lastError)) {
throw lastError;
}
if (attempt === this.config.maxAttempts) {
throw lastError;
}
const delay = this.calculateDelay(attempt);
await this.sleep(delay);
}
}
throw lastError!;
}
private isRetryable(error: Error): boolean {
const apiError = error as { status?: number; code?: string };
// Retry on rate limits and server errors
const retryableStatuses = [429, 500, 502, 503];
if (apiError.status && retryableStatuses.includes(apiError.status)) {
return true;
}
// Retry on network timeouts
if (apiError.code === 'ETIMEDOUT' || apiError.code === 'ECONNRESET') {
return true;
}
return false;
}
private calculateDelay(attempt: number): number {
const exponential = this.config.initialDelayMs * Math.pow(2, attempt - 1);
const capped = Math.min(exponential, this.config.maxDelayMs);
const jitter = randomInt(0, this.config.jitterFactor * capped);
return capped + jitter;
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
2. Rate Limit Handler
Rate limit errors require parsing the Retry-After header. Hardcoded delays are inefficient and may violate provider policies.
export async function handleQuotaExhaustion<T>(
request: () => Promise<T>,
maxWaitTimeMs: number
): Promise<T> {
const startTime = Date.now();
while (true) {
try {
return await request();
} catch (error) {
const apiError = error as { status?: number; headers?: Record<string, string> };
if (apiError.status !== 429) {
throw error;
}
const retryAfterHeader = apiError.headers?.['retry-after'];
const waitTime = retryAfterHeader
? parseInt(retryAfterHeader, 10) * 1000
: 5000; // Default fallback
const elapsed = Date.now() - startTime;
if (elapsed + waitTime > maxWaitTimeMs) {
throw new Error('Rate limit wait time exceeds maximum allowed duration');
}
await new Promise(resolve => setTimeout(resolve, waitTime));
}
}
}
3. Structured Output Validation
Models frequently return malformed JSON. Validation must occur immediately after parsing, with a fallback strategy.
import { z } from 'zod';
const EntityExtractionSchema = z.object({
entities: z.array(z.object({
name: z.string(),
type: z.enum(['PERSON', 'ORG', 'LOCATION']),
confidence: z.number().min(0).max(1)
})),
summary: z.string().max(500)
});
export type ExtractedEntities = z.infer<typeof EntityExtractionSchema>;
export async function validateStructuredOutput(rawContent: string): Promise<ExtractedEntities> {
try {
const parsed = JSON.parse(rawContent);
return EntityExtractionSchema.parse(parsed);
} catch (validationError) {
// Fallback: Attempt to repair or return safe default
console.warn('Structured output validation failed:', validationError);
return {
entities: [],
summary: 'Extraction failed due to format error.'
};
}
}
4. Failure Guard (Circuit Breaker)
Prevent calls when the provider is consistently failing. This protects system resources.
export type GuardState = 'CLOSED' | 'TRIPPED' | 'PROBING';
export class FailureGuard {
private state: GuardState = 'CLOSED';
private failureCount: number = 0;
private lastTripTime: number = 0;
private probeCount: number = 0;
constructor(
private readonly failureThreshold: number,
private readonly recoveryTimeoutMs: number,
private readonly probeLimit: number = 1
) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.state === 'TRIPPED') {
if (Date.now() - this.lastTripTime > this.recoveryTimeoutMs) {
this.state = 'PROBING';
this.probeCount = 0;
} else {
throw new Error('Failure guard is tripped; requests blocked');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess(): void {
if (this.state === 'PROBING') {
this.probeCount++;
if (this.probeCount >= this.probeLimit) {
this.state = 'CLOSED';
this.failureCount = 0;
}
} else {
this.failureCount = 0;
}
}
private onFailure(): void {
this.failureCount++;
this.lastTripTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'TRIPPED';
}
}
}
5. Cost Controller
Track token usage to enforce budget limits.
export class CostController {
private accumulatedTokens: number = 0;
constructor(
private readonly tokenBudget: number,
private readonly costMultiplier: number
) {}
async executeWithBudget<T>(operation: () => Promise<{ usage?: { tokens: number } } & T>): Promise<T> {
const result = await operation();
if (result.usage) {
this.accumulatedTokens += result.usage.tokens;
if (this.accumulatedTokens > this.tokenBudget) {
throw new Error(`Token budget exceeded. Current: ${this.accumulatedTokens}, Limit: ${this.tokenBudget}`);
}
}
return result;
}
getRemainingBudget(): number {
return Math.max(0, this.tokenBudget - this.accumulatedTokens);
}
}
6. Composed Resilient Client
Combine all components into a unified interface.
export class ResilientModelClient {
private retryEngine: RetryEngine;
private failureGuard: FailureGuard;
private costController: CostController;
constructor(config: {
retry: RetryConfig;
circuitBreaker: { threshold: number; timeout: number };
budget: { tokens: number; multiplier: number };
}) {
this.retryEngine = new RetryEngine(config.retry);
this.failureGuard = new FailureGuard(
config.circuitBreaker.threshold,
config.circuitBreaker.timeout
);
this.costController = new CostController(
config.budget.tokens,
config.budget.multiplier
);
}
async generateCompletion<T>(
prompt: string,
operation: (p: string) => Promise<{ content: string; usage?: { tokens: number } }>
): Promise<T> {
// Layer 1: Cost Check
// Layer 2: Circuit Breaker
// Layer 3: Retry Logic
// Layer 4: Rate Limit Handling
// Layer 5: Validation
return this.failureGuard.execute(async () => {
return this.retryEngine.execute(async () => {
return handleQuotaExhaustion(async () => {
const rawResponse = await operation(prompt);
// Validate and return
return rawResponse.content as unknown as T;
}, 30000);
});
});
}
}
Pitfall Guide
Production AI systems fail in predictable ways. Avoid these common mistakes to ensure stability.
| Pitfall | Explanation | Fix |
|---|
| Retrying Client Errors | Retrying HTTP 400 errors causes infinite loops and wastes tokens. | Implement error classification. Only retry 429, 5xx, and network timeouts. |
| Ignoring Retry-After | Using fixed delays during rate limiting may violate provider policies or waste time. | Parse the Retry-After header and respect its value. |
| Validation Loops | Retrying invalid JSON without modifying the prompt leads to repeated failures. | Limit validation retries. On failure, return a safe default or mutate the prompt strategy. |
| Static Timeouts | Fixed timeouts may be too short for long contexts or too long for simple queries. | Use dynamic timeouts based on estimated token count or implement AbortSignal with configurable limits. |
| Circuit Breaker Flapping | Setting the threshold too low causes the breaker to trip on transient spikes. | Use a higher threshold (e.g., 5-10 failures) and implement hysteresis or a probing state. |
| Silent Hallucinations | Assuming valid JSON implies correct content. | Implement secondary validation, confidence scoring, or human-in-the-loop for critical outputs. |
| Cost Blindness | Not tracking token usage leads to unexpected bills during retry storms. | Implement a cost controller that tracks usage per request and enforces budget limits. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Throughput Batch Processing | Aggressive retry with circuit breaker | Maximizes completion rate while protecting against outages | Moderate; retries increase token usage |
| Real-time User Interaction | Strict timeouts with fallback | Ensures low latency; fallback maintains UX | Low; timeouts reduce wasted tokens |
| Critical Data Extraction | Schema validation with human review | Ensures accuracy; validation catches structural errors | High; human review adds operational cost |
| Exploratory/Debugging | Verbose logging with no retry | Captures full error context for analysis | Low; no retries, but logs may be large |
| Budget-Constrained Environment | Strict cost controller with early exit | Prevents runaway costs; stops processing when budget hit | Low; enforces spending limits |
Configuration Template
Use this template to configure your resilience layer. Adjust values based on your provider's SLA and your application's requirements.
export const resilienceConfig = {
retry: {
maxAttempts: 3,
initialDelayMs: 1000,
maxDelayMs: 30000,
jitterFactor: 0.1
},
circuitBreaker: {
threshold: 5,
timeout: 60000
},
budget: {
tokens: 1000000,
multiplier: 0.00001
},
timeouts: {
connection: 5000,
request: 120000
},
rateLimit: {
maxWaitTime: 60000
}
};
Quick Start Guide
- Install Dependencies: Add
zod for validation and your preferred HTTP client.
npm install zod
- Create Client Wrapper: Implement the
ResilientModelClient class using the code examples above.
- Configure Resilience: Define your retry, circuit breaker, and budget settings in a configuration object.
- Integrate Validation: Define Zod schemas for your expected outputs and wrap your parsing logic.
- Deploy and Monitor: Deploy the updated client and monitor error rates, latency, and token usage. Adjust thresholds based on observed behavior.