guration errors
Step 2: Implement Static Model Decoration (Vercel AI SDK)
Vercel's model treats middleware as a decorator. The wrapper returns a new model instance that transparently intercepts doGenerate and doStream. This is ideal for infrastructure concerns that don't change per request.
import {
wrapLanguageModel,
LanguageModelV3Middleware,
LanguageModelV3StreamPart
} from '@ai-sdk/provider';
import { generateText, streamText } from 'ai';
// Factory creates a long-lived middleware instance
export function createTraceRetryMiddleware(traceId: string): LanguageModelV3Middleware {
return {
transformParams: async ({ params }) => {
// Attach trace context before execution
return {
...params,
providerMetadata: {
traceRetry: { traceId, timestamp: Date.now() }
}
};
},
wrapGenerate: async ({ doGenerate, params }) => {
let attempts = 0;
const maxRetries = 3;
while (attempts < maxRetries) {
try {
const result = await doGenerate();
console.log(`[TRACE:${traceId}] Generate success`, {
tokens: result.usage?.totalTokens
});
return result;
} catch (error) {
attempts++;
if (attempts === maxRetries) throw error;
await new Promise(r => setTimeout(r, 2 ** attempts * 100));
}
}
throw new Error('Retry exhausted');
},
wrapStream: async ({ doStream, params }) => {
const { stream, ...rest } = await doStream();
let chunkCount = 0;
const tracedStream = new TransformStream<
LanguageModelV3StreamPart,
LanguageModelV3StreamPart
>({
transform(chunk, controller) {
chunkCount++;
controller.enqueue(chunk);
},
flush() {
console.log(`[TRACE:${traceId}] Stream completed`, { chunks: chunkCount });
}
});
return { stream: stream.pipeThrough(tracedStream), ...rest };
}
};
}
// Usage: Wrap once, reuse everywhere
const tracedModel = wrapLanguageModel({
model: 'anthropic/claude-sonnet-4.5',
middleware: createTraceRetryMiddleware('infra-001')
});
const response = await generateText({ model: tracedModel, prompt: 'Explain quantum entanglement.' });
Architecture Rationale:
transformParams runs before execution, allowing metadata injection without touching the response pipeline.
- Streaming is explicitly separated because
TransformStream requires chunk-level control that conflicts with non-streaming promise resolution.
- The middleware is stateless regarding request context, making it safe for connection pooling and cold-start optimization.
Step 3: Implement Dynamic Request Interception (Genkit)
Genkit's model treats middleware as a per-call interceptor. Configuration is validated at instantiation time using Zod, and the interceptor stack is rebuilt for every generate() call. This is ideal for business logic that depends on runtime context.
import { generateMiddleware, z } from 'genkit';
import { ai } from './genkit-config';
// Factory enforces strict configuration contracts
export const contextualRetryInterceptor = generateMiddleware(
{
name: 'contextualRetryInterceptor',
description: 'Attaches trace context and retries transient failures',
configSchema: z.object({
traceId: z.string().min(1),
maxRetries: z.number().int().min(1).max(5).default(3),
backoffBase: z.number().min(100).default(200)
})
},
({ config }) => ({
model: async (req, ctx, next) => {
let attempts = 0;
const { traceId, maxRetries, backoffBase } = config;
// Inject trace context into request metadata
req.metadata = { ...req.metadata, traceId, startedAt: Date.now() };
while (attempts < maxRetries) {
try {
const response = await next(req, ctx);
console.log(`[TRACE:${traceId}] Model call succeeded`, {
status: response.finishReason,
latency: Date.now() - (req.metadata?.startedAt as number)
});
return response;
} catch (error) {
attempts++;
if (attempts === maxRetries) throw error;
const delay = backoffBase * Math.pow(2, attempts) * (0.5 + Math.random());
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Retry budget exhausted');
},
// Genkit exposes tool execution as a first-class hook
tool: async (req, ctx, next) => {
console.log(`[TRACE:${ctx.metadata?.traceId}] Tool intercepted: ${req.name}`);
return next(req, ctx);
}
})
);
// Usage: Configure per request with full type safety
const response = await ai.generate({
model: 'google/gemini-flash-latest',
prompt: 'Draft a technical specification.',
use: [
contextualRetryInterceptor({
traceId: `req-${crypto.randomUUID()}`,
maxRetries: 2
})
]
});
Architecture Rationale:
- Zod schema validation prevents silent misconfiguration. Invalid options throw immediately at call site.
- The
model hook handles both streaming and non-streaming uniformly, abstracting away chunk management but sacrificing low-level backpressure control.
- The
tool hook enables agent-specific logic (approval gates, rate limiting) that Vercel's spec deliberately excludes from the language model contract.
Pitfall Guide
1. Static/Dynamic Context Mismatch
Explanation: Attempting to inject per-request data (user ID, tenant quota) into Vercel's static model wrapper forces you to mutate shared state or rebuild the model on every request, defeating the purpose of static decoration.
Fix: Use Vercel's providerOptions namespace for lightweight per-call metadata, or switch to Genkit's dynamic use: [] array when middleware requires runtime context.
2. Streaming Backpressure Blind Spot
Explanation: Genkit's unified model hook abstracts streaming, which simplifies code but removes direct access to chunk flow control. High-throughput applications may experience memory pressure if the consumer cannot keep up with the LLM's token generation rate.
Fix: For latency-sensitive streaming UIs, prefer Vercel's explicit wrapStream with TransformStream to implement backpressure signaling, pause/resume logic, or chunk sampling.
3. Zod Schema Rigidity vs Loose Options
Explanation: Genkit's Zod validation fails fast on misconfiguration, which is excellent for developer experience but adds runtime overhead if middleware is instantiated thousands of times per second. Vercel's providerOptions is loosely typed, allowing silent data loss if keys are misspelled.
Fix: In high-frequency call sites, cache Genkit middleware instances or pre-validate configurations during application bootstrap. For Vercel, wrap providerOptions access in a type-safe helper that throws on missing keys.
Explanation: Vercel AI SDK deliberately excludes tool execution from its language model middleware contract. Attempting to implement tool gating or approval inside wrapGenerate will fail because tool calls are handled at the agent/orchestrator layer, not the model layer.
Fix: Use Vercel's experimental_prepareStep or agent-level hooks for tool gating. In Genkit, leverage the native tool hook for first-class interception, or implement human-in-the-loop approval using ToolInterruptError.
5. Composition Order Confusion
Explanation: Both frameworks use an "onion model" where the first middleware in the array wraps the second, but execution flows inward then outward. Developers often assume left-to-right execution order, leading to retry logic running after logging, or guardrails bypassing rate limits.
Fix: Always document execution flow explicitly. Outer middleware runs first on entry, last on exit. Place infrastructure concerns (retry, fallback) on the outside, and business concerns (logging, tracing) on the inside.
6. Multi-Language Portability Assumption
Explanation: Teams standardizing on Vercel AI SDK middleware patterns often hit a wall when extending AI logic to Go, Python, or Dart microservices. The middleware contract is TypeScript-specific and relies on Node.js streaming primitives.
Fix: If your architecture spans multiple languages, adopt Genkit's middleware pattern at the service boundary, or implement a protocol-level interceptor (e.g., gRPC middleware, HTTP proxy) that operates independently of the SDK.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Multi-tenant SaaS with per-user quotas | Dynamic Interception (Genkit) | Runtime context injection requires per-call configuration | Low (Zod validation adds ~0.2ms/call) |
| Provider fallback network with jitter | Static Decoration (Vercel) | Infrastructure logic benefits from long-lived, cached instances | Negative (reduces cold-start latency by 15-30%) |
| Real-time streaming UI with backpressure | Static Decoration (Vercel) | Explicit wrapStream enables chunk-level flow control | Neutral (requires manual TransformStream management) |
| Agentic workflow with human approval | Dynamic Interception (Genkit) | tool hook natively supports interruption/resume cycles | Low (framework handles state machine) |
| Polyglot microservices architecture | Dynamic Interception (Genkit) | Multi-language SDK support enables consistent patterns | High (justifies framework migration cost) |
Configuration Template
Vercel AI SDK (Static Wrapper)
import { wrapLanguageModel, LanguageModelV3Middleware } from '@ai-sdk/provider';
export const productionMiddleware: LanguageModelV3Middleware = {
transformParams: async ({ params }) => ({
...params,
providerMetadata: {
infra: { env: process.env.NODE_ENV, region: process.env.AWS_REGION }
}
}),
wrapGenerate: async ({ doGenerate }) => doGenerate(),
wrapStream: async ({ doStream }) => {
const { stream, ...rest } = await doStream();
return { stream, ...rest }; // Pipe through TransformStream if needed
}
};
export const wrappedModel = wrapLanguageModel({
model: 'openai/gpt-4o',
middleware: [productionMiddleware]
});
Genkit (Dynamic Interceptor)
import { generateMiddleware, z } from 'genkit';
export const productionInterceptor = generateMiddleware(
{
name: 'productionInterceptor',
configSchema: z.object({
env: z.enum(['dev', 'staging', 'prod']),
timeoutMs: z.number().default(30000)
})
},
({ config }) => ({
model: async (req, ctx, next) => {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), config.timeoutMs);
try {
return await next(req, ctx);
} finally {
clearTimeout(timeout);
}
}
})
);
Quick Start Guide
- Initialize the framework: Install
ai for Vercel or genkit for Google's ecosystem. Configure your provider credentials and default model.
- Define the middleware contract: Write a factory function that returns the middleware object. For Genkit, attach a Zod schema immediately. For Vercel, implement
transformParams, wrapGenerate, and wrapStream.
- Attach to execution path: Wrap your model instance with
wrapLanguageModel (Vercel) or pass the factory to the use: [] array on ai.generate (Genkit).
- Validate with integration tests: Mock a failing provider response and verify retry logic executes the correct number of times. Inject per-request metadata and confirm it appears in logs.
- Benchmark and tune: Measure cold-start latency and per-call overhead. Cache static wrappers. Pre-validate dynamic configurations. Adjust backoff parameters based on provider error rates.