Adding observability to your Vercel AI SDK app in 30 seconds
Production-Grade Telemetry for AI Workflows Using OpenTelemetry
Current Situation Analysis
Shipping generative AI features into production without structured observability is a financial and operational liability. Teams routinely deploy streamText, generateObject, or tool-augmented workflows using the Vercel AI SDK, yet operate with zero visibility into token consumption, inference latency, or per-user cost attribution. The problem is rarely technical complexity; it is architectural misalignment.
Most engineering teams treat AI observability as an application-layer concern. They wrap LLM calls in custom middleware, manually instrument HTTP clients, or rely on vendor-specific SDKs that inject tracing logic directly into business routes. This approach fractures the abstraction layer, couples your codebase to a single observability provider, and forces developers to maintain duplicate instrumentation logic across every AI endpoint.
The oversight stems from a false assumption: that telemetry must be built manually. In reality, the Vercel AI SDK natively emits OpenTelemetry-compliant spans the moment you enable its internal telemetry flag. The instrumentation wire exists at the framework level. What is missing is simply a standardized receiver.
Data from production deployments consistently shows that untracked AI workloads exhibit three failure patterns:
- Cost drift: Token caching mechanisms (
promptCacheId, cache hits/misses) are invisible without span-level metrics, causing budget overruns when retry loops or prompt variations bypass cache layers. - Latency blindness: Streaming endpoints mask backend inference time. Without duration metrics attached to spans, teams cannot distinguish between network backpressure, provider throttling, or model bottleneck.
- Attribution gaps: Per-request cost allocation requires contextual tags. Without structured metadata injection, finance and engineering cannot correlate usage with user tiers, organizations, or feature flags.
The solution is not another middleware wrapper. It is leveraging the OpenTelemetry standard that the SDK already speaks.
WOW Moment: Key Findings
Traditional AI observability setups force developers to choose between vendor lock-in and manual instrumentation. Native OpenTelemetry integration flips this trade-off. The following comparison illustrates the operational shift when moving from middleware-based tracing to framework-level OTel adoption.
| Approach | Initialization Overhead | Vendor Coupling | Span Granularity | Cost Attribution |
|---|---|---|---|---|
| Custom Middleware Wrapper | High (per-route setup) | Severe (tied to provider SDK) | Low (manual metric collection) | Manual (requires DB joins) |
| Vendor-Specific SDK | Medium (provider setup) | High (proprietary format) | Medium (provider-defined) | Semi-automated |
| Native OTel Integration | Low (single registry) | None (standard protocol) | High (framework-emitted) | Automated (metadata tags) |
This finding matters because it decouples observability from application logic. By routing AI SDK telemetry through the OpenTelemetry specification, you gain access to standardized span attributes (token counts, latency, tool calls, finish reasons) without modifying business routes. The architecture becomes provider-agnostic: swap exporters, add multi-export processors, or route traces to existing infrastructure without touching AI code.
Core Solution
Implementing production telemetry requires three architectural decisions: framework registration, span export configuration, and contextual metadata injection. The following implementation uses a modular registry pattern that isolates telemetry configuration from route handlers.
Step 1: Install Required Packages
The Vercel AI SDK requires a bridge to translate internal telemetry events into OpenTelemetry spans, alongside a Next.js OTel bootstrap and a standard trace exporter.
npm install @ai-sdk/otel @vercel/otel @opentelemetry/exporter-trace-otlp-http
Step 2: Create a Telemetry Registry Module
Instead of scattering configuration across route files, centralize the OTel bootstrap in a dedicated module. This ensures consistent service naming, environment-aware sampling, and clean separation of concerns.
// src/telemetry/registry.ts
import { registerTelemetry } from 'ai';
import { LegacyOpenTelemetry } from '@ai-sdk/otel';
import { registerOTel } from '@vercel/otel';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
export function initializeAiTelemetry() {
// Bridge AI SDK internal events to OpenTelemetry
registerTelemetry(new LegacyOpenTelemetry());
// Configure Next.js OTel runtime
registerOTel({
serviceName: process.env.OTEL_SERVICE_NAME ?? 'ai-workflow-api',
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
}),
// Optional: environment-aware sampling
forceFlushTimeoutMillis: 5000,
});
}
Step 3: Wire the Registry in Next.js Instrumentation
Next.js automatically executes instrumentation.ts at server startup. This is the correct lifecycle hook for telemetry registration.
// instrumentation.ts
import { initializeAiTelemetry } from './src/telemetry/registry';
initializeAiTelemetry();
Step 4: Enable Telemetry on AI Calls
Telemetry activation happens at the call site via the experimental_telemetry configuration object. This approach keeps instrumentation declarative and avoids wrapping logic in controllers.
// src/app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { NextRequest } from 'next/server';
export async function POST(request: NextRequest) {
const payload = await request.json();
const inferenceResult = streamText({
model: openai('gpt-4o-mini'),
prompt: payload.userInput,
experimental_telemetry: {
isEnabled: true,
functionId: 'chat-inference-v1',
metadata: {
userId: payload.authContext?.id ?? 'anonymous',
tenantId: payload.authContext?.orgId,
featureFlag: payload.authContext?.tier,
},
},
});
return inferenceResult.toAIStreamResponse();
}
Architecture Rationale
- Why
instrumentation.ts? Next.js guarantees this file runs before request handling begins. Registering OTel here ensures all subsequent AI SDK calls inherit the configured exporter without manual initialization. - Why
LegacyOpenTelemetry? The AI SDK maintains its own telemetry pipeline. This adapter translates SDK-specific events into OTel-compliant spans, preserving token counts, latency, and tool call data while conforming to the OpenTelemetry semantic conventions. - Why declarative
experimental_telemetry? Attaching configuration directly to the AI call keeps telemetry contextually bound to the operation. It avoids global state pollution and allows per-route overrides (e.g., disabling telemetry for health checks or internal tooling). - Why metadata injection? OpenTelemetry span attributes are queryable. By attaching
userId,tenantId, andfeatureFlagat the call site, downstream collectors can aggregate costs, filter latency by tenant, and trigger alerts based on feature rollout status without additional database queries.
Pitfall Guide
1. PII Leakage in Prompt/Response Spans
Explanation: The AI SDK captures full prompt and response text by default. Shipping unredacted user inputs or model outputs to observability backends violates data governance policies and increases storage costs. Fix: Enable PII scrubbing at the exporter level or configure the SDK to truncate sensitive fields. Use environment flags to disable text capture in production while retaining token and latency metrics.
2. Ignoring Token Caching Signals
Explanation: Cache hits and misses drastically alter cost profiles. Standard token counts (inputTokens, outputTokens) do not distinguish between cached and uncached reads, leading to inaccurate budget forecasting.
Fix: Monitor metadata.tokens.cacheReads and metadata.tokens.cacheWrites alongside standard counts. Alert when cache hit rates drop below thresholds during traffic spikes.
3. High-Cardinality Metadata Injection
Explanation: Attaching unbounded values (e.g., full request IDs, session tokens, or raw user emails) to span metadata creates cardinality explosions in time-series databases, degrading query performance and increasing ingestion costs.
Fix: Restrict metadata to low-cardinality business dimensions (userId, tenantId, planTier). Hash or truncate identifiers if necessary. Validate metadata schemas before deployment.
4. Misaligned Service Names Across Environments
Explanation: Hardcoding serviceName causes trace fragmentation when deploying to staging, QA, or production. Observability platforms cannot correlate spans across environments without consistent service identifiers.
Fix: Derive serviceName from environment variables (process.env.NODE_ENV, process.env.DEPLOYMENT_TARGET). Append environment suffixes programmatically during registration.
5. Forcing Synchronous Tracing on Streaming Endpoints
Explanation: Streaming responses emit spans asynchronously. If the OTel processor flushes before the stream completes, latency metrics and final token counts will be truncated or missing.
Fix: Configure forceFlushTimeoutMillis to accommodate stream duration. Use MultiSpanProcessor with BatchSpanProcessor tuned for streaming workloads. Verify span lifecycle matches response completion.
6. Skipping Error State Propagation
Explanation: Failed AI calls (rate limits, model errors, tool failures) often return incomplete spans. Without explicit error tagging, observability dashboards show false success rates.
Fix: Ensure the SDK's outcome: 'failed' and errorMessage attributes are captured. Map HTTP 429/500 responses to span status codes. Add retry logic with backoff and track retry counts as span attributes.
7. Neglecting Sampling Strategies
Explanation: Ingesting 100% of AI spans at scale overwhelms collectors and inflates costs. Blind sampling discards valuable failure data and skews latency distributions.
Fix: Implement trace-based sampling. Use ParentBasedSampler to inherit parent span decisions. Apply higher sampling rates to error states and lower rates to successful streaming responses. Configure exporter-level sampling if the collector supports it.
Production Bundle
Action Checklist
- Install OTel bridge and exporter packages:
@ai-sdk/otel,@vercel/otel,@opentelemetry/exporter-trace-otlp-http - Create centralized telemetry registry module with environment-aware configuration
- Register OTel runtime in
instrumentation.tsusing Next.js lifecycle hooks - Enable
experimental_telemetryon all AI SDK calls withisEnabled: true - Attach low-cardinality metadata (
userId,tenantId,featureFlag) for cost attribution - Configure PII scrubbing or text truncation for production environments
- Set
forceFlushTimeoutMillisto accommodate streaming response durations - Implement trace-based sampling to balance observability depth with ingestion costs
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Early-stage prototype | Native OTel with console exporter | Fastest setup, zero infrastructure overhead | Negligible |
| Multi-tenant SaaS | OTel + OTLP exporter + metadata tagging | Enables per-tenant cost allocation and latency filtering | Low (ingestion scales with tenants) |
| High-volume streaming API | OTel + BatchSpanProcessor + sampling | Prevents collector overload while preserving error traces | Medium (sampling reduces ingestion by 60-80%) |
| Compliance-restricted workload | OTel + PII scrubbing + text truncation | Meets data governance without sacrificing token/latency metrics | Low (storage costs decrease) |
| Vendor-agnostic architecture | OTel + MultiSpanProcessor | Swaps exporters without code changes, supports dual-collector setups | Low (exporter swap is configuration-only) |
Configuration Template
// src/telemetry/registry.ts
import { registerTelemetry } from 'ai';
import { LegacyOpenTelemetry } from '@ai-sdk/otel';
import { registerOTel } from '@vercel/otel';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
export function initializeAiTelemetry() {
registerTelemetry(new LegacyOpenTelemetry());
const traceExporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
headers: {
Authorization: `Bearer ${process.env.OTEL_API_KEY ?? ''}`,
},
});
registerOTel({
serviceName: `${process.env.OTEL_SERVICE_NAME ?? 'ai-api'}-${process.env.NODE_ENV ?? 'dev'}`,
traceExporter,
spanProcessors: [
new BatchSpanProcessor(traceExporter, {
maxQueueSize: 2048,
maxExportBatchSize: 512,
scheduledDelayMillis: 5000,
exportTimeoutMillis: 30000,
}),
],
forceFlushTimeoutMillis: 10000,
});
}
Quick Start Guide
- Install dependencies: Run
npm install @ai-sdk/otel @vercel/otel @opentelemetry/exporter-trace-otlp-http - Create registry: Add
src/telemetry/registry.tswith the configuration template above - Wire instrumentation: Create
instrumentation.tsat project root and callinitializeAiTelemetry() - Enable on calls: Add
experimental_telemetry: { isEnabled: true, metadata: { userId: '...' } }tostreamTextorgenerateObjectinvocations - Verify spans: Start the Next.js dev server, trigger an AI call, and inspect the OTLP endpoint or local collector for emitted spans containing token counts, latency, and metadata attributes
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
