Production-Grade Telemetry for AI Workflows Using OpenTelemetry

Current Situation Analysis

Shipping generative AI features into production without structured observability is a financial and operational liability. Teams routinely deploy streamText, generateObject, or tool-augmented workflows using the Vercel AI SDK, yet operate with zero visibility into token consumption, inference latency, or per-user cost attribution. The problem is rarely technical complexity; it is architectural misalignment.

Most engineering teams treat AI observability as an application-layer concern. They wrap LLM calls in custom middleware, manually instrument HTTP clients, or rely on vendor-specific SDKs that inject tracing logic directly into business routes. This approach fractures the abstraction layer, couples your codebase to a single observability provider, and forces developers to maintain duplicate instrumentation logic across every AI endpoint.

The oversight stems from a false assumption: that telemetry must be built manually. In reality, the Vercel AI SDK natively emits OpenTelemetry-compliant spans the moment you enable its internal telemetry flag. The instrumentation wire exists at the framework level. What is missing is simply a standardized receiver.

Data from production deployments consistently shows that untracked AI workloads exhibit three failure patterns:

Cost drift: Token caching mechanisms (promptCacheId, cache hits/misses) are invisible without span-level metrics, causing budget overruns when retry loops or prompt variations bypass cache layers.
Latency blindness: Streaming endpoints mask backend inference time. Without duration metrics attached to spans, teams cannot distinguish between network backpressure, provider throttling, or model bottleneck.
Attribution gaps: Per-request cost allocation requires contextual tags. Without structured metadata injection, finance and engineering cannot correlate usage with user tiers, organizations, or feature flags.

The solution is not another middleware wrapper. It is leveraging the OpenTelemetry standard that the SDK already speaks.

WOW Moment: Key Findings

Traditional AI observability setups force developers to choose between vendor lock-in and manual instrumentation. Native OpenTelemetry integration flips this trade-off. The following comparison illustrates the operational shift when moving from middleware-based tracing to framework-level OTel adoption.

Approach	Initialization Overhead	Vendor Coupling	Span Granularity	Cost Attribution
Custom Middleware Wrapper	High (per-route setup)	Severe (tied to provider SDK)	Low (manual metric collection)	Manual (requires DB joins)
Vendor-Specific SDK	Medium (provider setup)	High (proprietary format)	Medium (provider-defined)	Semi-automated
Native OTel Integration	Low (single registry)	None (standard protocol)	High (framework-emitted)	Automated (metadata tags)

This finding matters because it decouples observability from application logic. By routing AI SDK telemetry through the OpenTelemetry specification, you gain access to standardized span attributes (token counts, latency, tool calls, finish reasons) without modifying business routes. The architecture becomes provider-agnostic: swap exporters, add multi-export processors, or route traces to existing infrastructure without touching AI code.

Core Solution

Implementing production telemetry requires three architectural decisions: framework registration, span export configuration, and contextual metadata injection. The following implementation uses a modular registry pattern that isolates telemetry configuration from route handlers.

Step 1: Install Required Packages

The Vercel AI SDK requires a bridge to translate internal telemetry events into OpenTelemetry spans, alongside a Next.js OTel bootstrap and a standard trace exporter.

npm install @ai-sdk/otel @vercel/otel @opentelemetry/exporter-trace-otlp-http

Step 2: Create a Telemetry Registry Module

Instead of scattering configuration across route files, centralize the OTel bootstrap in a dedicated module. This ensures consistent service naming, environment-aware sampling, and clean separation of concerns.

// src/telemetry/registry.ts
import { registerTelemetry } from 'ai';
import { LegacyOpenTelemetry } from '@ai-sdk/otel';
import { registerOTel } from '@vercel/otel';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

export function initializeAiTelemetry() {
  // Bridge AI SDK internal events to OpenTelemetry
  registerTelemetry(new LegacyOpenTelemetry());

  // Configure Next.js OTel runtime
  registerOTel({
    serviceName: process.env.OTEL_SERVICE_NAME ?? 'ai-workflow-api',
    traceExporter: new OTLPTraceExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
    }),
    // Optional: environment-aware sampling
    forceFlushTimeoutMillis: 5000,
  });
}

Step 3: Wire the Registry in Next.js Instrumentation

Next.js automatically executes instrumentation.ts at server startup. This is the correct lifecycle hook for telemetry registration.

// instrumentation.ts
import { initializeAiTelemetry } from './src/telemetry/registry';

initializeAiTelemetry();

Step 4: Enable Telemetry on AI Calls

Telemetry activation happens at the call site via the experimental_telemetry configuration object. This approach keeps instrumentation declarative and avoids wrapping logic in controllers.

// src/app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { NextRequest } from 'next/server';

export async function POST(request: NextRequest) {
  const payload = await request.json();

  const inferenceResult = streamText({
    model: openai('gpt-4o-mini'),
    prompt: payload.userInput,
    experimental_telemetry: {
      isEnabled: true,
      functionId: 'chat-inference-v1',
      metadata: {
        userId: payload.authContext?.id ?? 'anonymous',
        tenantId: payload.authContext?.orgId,
        featureFlag: payload.authContext?.tier,
      },
    },
  });

  return inferenceResult.toAIStreamResponse();
}

Architecture Rationale

Why instrumentation.ts? Next.js guarantees this file runs before request handling begins. Registering OTel here ensures all subsequent AI SDK calls inherit the configured exporter without manual initialization.
Why LegacyOpenTelemetry? The AI SDK maintains its own telemetry pipeline. This adapter translates SDK-specific events into OTel-compliant spans, preserving token counts, latency, and tool call data while conforming to the OpenTelemetry semantic conventions.
Why declarative experimental_telemetry? Attaching configuration directly to the AI call keeps telemetry contextually bound to the operation. It avoids global state pollution and allows per-route overrides (e.g., disabling telemetry for health checks or internal tooling).
Why metadata injection? OpenTelemetry span attributes are queryable. By attaching userId, tenantId, and featureFlag at the call site, downstream collectors can aggregate costs, filter latency by tenant, and trigger alerts based on feature rollout status without additional database queries.

Pitfall Guide

1. PII Leakage in Prompt/Response Spans

Explanation: The AI SDK captures full prompt and response text by default. Shipping unredacted user inputs or model outputs to observability backends violates data governance policies and increases storage costs. Fix: Enable PII scrubbing at the exporter level or configure the SDK to truncate sensitive fields. Use environment flags to disable text capture in production while retaining token and latency metrics.

2. Ignoring Token Caching Signals

Explanation: Cache hits and misses drastically alter cost profiles. Standard token counts (inputTokens, outputTokens) do not distinguish between cached and uncached reads, leading to inaccurate budget forecasting. Fix: Monitor metadata.tokens.cacheReads and metadata.tokens.cacheWrites alongside standard counts. Alert when cache hit rates drop below thresholds during traffic spikes.

3. High-Cardinality Metadata Injection

Explanation: Attaching unbounded values (e.g., full request IDs, session tokens, or raw user emails) to span metadata creates cardinality explosions in time-series databases, degrading query performance and increasing ingestion costs. Fix: Restrict metadata to low-cardinality business dimensions (userId, tenantId, planTier). Hash or truncate identifiers if necessary. Validate metadata schemas before deployment.

4. Misaligned Service Names Across Environments

Explanation: Hardcoding serviceName causes trace fragmentation when deploying to staging, QA, or production. Observability platforms cannot correlate spans across environments without consistent service identifiers. Fix: Derive serviceName from environment variables (process.env.NODE_ENV, process.env.DEPLOYMENT_TARGET). Append environment suffixes programmatically during registration.

5. Forcing Synchronous Tracing on Streaming Endpoints

Explanation: Streaming responses emit spans asynchronously. If the OTel processor flushes before the stream completes, latency metrics and final token counts will be truncated or missing. Fix: Configure forceFlushTimeoutMillis to accommodate stream duration. Use MultiSpanProcessor with BatchSpanProcessor tuned for streaming workloads. Verify span lifecycle matches response completion.

6. Skipping Error State Propagation

Explanation: Failed AI calls (rate limits, model errors, tool failures) often return incomplete spans. Without explicit error tagging, observability dashboards show false success rates. Fix: Ensure the SDK's outcome: 'failed' and errorMessage attributes are captured. Map HTTP 429/500 responses to span status codes. Add retry logic with backoff and track retry counts as span attributes.

7. Neglecting Sampling Strategies

Explanation: Ingesting 100% of AI spans at scale overwhelms collectors and inflates costs. Blind sampling discards valuable failure data and skews latency distributions. Fix: Implement trace-based sampling. Use ParentBasedSampler to inherit parent span decisions. Apply higher sampling rates to error states and lower rates to successful streaming responses. Configure exporter-level sampling if the collector supports it.

Production Bundle

Action Checklist

Install OTel bridge and exporter packages: @ai-sdk/otel, @vercel/otel, @opentelemetry/exporter-trace-otlp-http
Create centralized telemetry registry module with environment-aware configuration
Register OTel runtime in instrumentation.ts using Next.js lifecycle hooks
Enable experimental_telemetry on all AI SDK calls with isEnabled: true
Attach low-cardinality metadata (userId, tenantId, featureFlag) for cost attribution
Configure PII scrubbing or text truncation for production environments
Set forceFlushTimeoutMillis to accommodate streaming response durations
Implement trace-based sampling to balance observability depth with ingestion costs

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Early-stage prototype	Native OTel with console exporter	Fastest setup, zero infrastructure overhead	Negligible
Multi-tenant SaaS	OTel + OTLP exporter + metadata tagging	Enables per-tenant cost allocation and latency filtering	Low (ingestion scales with tenants)
High-volume streaming API	OTel + BatchSpanProcessor + sampling	Prevents collector overload while preserving error traces	Medium (sampling reduces ingestion by 60-80%)
Compliance-restricted workload	OTel + PII scrubbing + text truncation	Meets data governance without sacrificing token/latency metrics	Low (storage costs decrease)
Vendor-agnostic architecture	OTel + MultiSpanProcessor	Swaps exporters without code changes, supports dual-collector setups	Low (exporter swap is configuration-only)

Configuration Template

// src/telemetry/registry.ts
import { registerTelemetry } from 'ai';
import { LegacyOpenTelemetry } from '@ai-sdk/otel';
import { registerOTel } from '@vercel/otel';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';

export function initializeAiTelemetry() {
  registerTelemetry(new LegacyOpenTelemetry());

  const traceExporter = new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
    headers: {
      Authorization: `Bearer ${process.env.OTEL_API_KEY ?? ''}`,
    },
  });

  registerOTel({
    serviceName: `${process.env.OTEL_SERVICE_NAME ?? 'ai-api'}-${process.env.NODE_ENV ?? 'dev'}`,
    traceExporter,
    spanProcessors: [
      new BatchSpanProcessor(traceExporter, {
        maxQueueSize: 2048,
        maxExportBatchSize: 512,
        scheduledDelayMillis: 5000,
        exportTimeoutMillis: 30000,
      }),
    ],
    forceFlushTimeoutMillis: 10000,
  });
}

Quick Start Guide

Install dependencies: Run npm install @ai-sdk/otel @vercel/otel @opentelemetry/exporter-trace-otlp-http
Create registry: Add src/telemetry/registry.ts with the configuration template above
Wire instrumentation: Create instrumentation.ts at project root and call initializeAiTelemetry()
Enable on calls: Add experimental_telemetry: { isEnabled: true, metadata: { userId: '...' } } to streamText or generateObject invocations
Verify spans: Start the Next.js dev server, trigger an AI call, and inspect the OTLP endpoint or local collector for emitted spans containing token counts, latency, and metadata attributes

Adding observability to your Vercel AI SDK app in 30 seconds