Your AI agent already emits OpenTelemetry. Why aren't you watching it?
Standardizing AI Agent Observability: From Vendor Lock-in to OpenTelemetry gen_ai.* Conventions
Current Situation Analysis
Generative AI agents operate on non-deterministic execution paths. Unlike traditional microservices that follow predictable request-response cycles, agents dynamically select models, construct prompts, invoke external tools, and retry on failure. Traditional observability stacks were never designed to capture this cognitive workflow.
When teams deploy LLM agents into production, they quickly encounter a visibility gap. Generic APM platforms track HTTP latency, error rates, and throughput, but they treat the AI layer as a black box. A POST /v1/chat span reveals nothing about which model was selected, how many tokens were consumed, which tools were invoked, or why the planner chose a specific action. The signal is either buried in raw request payloads or discarded entirely.
To bridge this gap, engineering teams historically reached for proprietary observability SDKs. These tools capture the right telemetry but introduce severe architectural debt. They couple your application to a specific vendor, require coordinated upgrades alongside framework releases, and multiply dependency footprints when teams run polyglot stacks. A single organization might use Spring AI for orchestration, LangChain4j for retrieval, and a Python framework for data preprocessing. Each vendor SDK demands its own initialization, configuration, and lifecycle management.
The industry is now shifting toward a standardized approach. The OpenTelemetry community finalized the gen_ai.* semantic conventions, and major AI frameworks have adopted them natively. Spring AI 1.0 emits telemetry via Micrometer Observations. LangChain4j exposes the same signals through its ChatModelListener API. Koog 0.8 includes a first-class OpenTelemetry feature for the JVM. Python's OpenLLMetry and OpenInference projects provide instrumentations for Anthropic, OpenAI, LangChain, and LlamaIndex. Go's otel-instrumentation-genai package follows the same pattern.
The telemetry is already on the wire in standard form. The bottleneck is no longer instrumentation; it's reception. Teams need an OTLP endpoint that understands gen_ai.* attributes, reconstructs agent workflows, and surfaces actionable insights without requiring application-level vendor dependencies.
WOW Moment: Key Findings
The transition from proprietary SDKs to standard OpenTelemetry conventions fundamentally changes how AI observability is architected. The table below compares the three dominant approaches currently in production.
| Approach | Signal Coverage | Framework Coupling | Implementation Effort | Backend Portability |
|---|---|---|---|---|
| Generic APM | ~15% (HTTP/infra only) | None | Low | High |
| Proprietary Vendor SDK | ~90% (LLM-specific) | High | High | Low |
| Standard OTel + LLM-Aware Backend | ~95% (Full semantic depth) | Zero | Low | High |
This finding matters because it decouples telemetry generation from telemetry consumption. Frameworks handle signal emission through native contracts. The collector or backend handles interpretation, cost mapping, graph reconstruction, and policy enforcement. Engineering teams can swap backends, upgrade frameworks, or migrate cloud providers without touching application code. The observability layer becomes infrastructure, not application logic.
Core Solution
Implementing standardized AI agent observability requires four architectural steps: validating native emission, configuring the OTLP exporter, deploying an LLM-aware receiver, and ensuring trace context propagation across agent boundaries.
Step 1: Validate Native Framework Emission
Confirm your framework emits gen_ai.* spans. Modern versions of Spring AI, LangChain4j, Koog, and Python/Go instrumentations output these attributes automatically. No custom instrumentation code is required.
Step 2: Configure Standard OTLP Exporter
Replace proprietary SDK initialization with standard OpenTelemetry exporter configuration. Use HTTP/protobuf for firewall compatibility or gRPC for high-throughput environments.
Java/Spring Boot Configuration
# application.properties
management.tracing.export.otlp.url=${OTEL_COLLECTOR_ENDPOINT}/v1/traces
management.tracing.export.otlp.headers.authorization=Bearer ${OTEL_AUTH_TOKEN}
management.tracing.sampling.probability=1.0
spring.ai.otel.enabled=true
Kotlin/Koog Implementation
import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.exporter.otlp.http.trace.OtlpHttpSpanExporter
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor
fun configureAgentTelemetry(serviceName: String): OpenTelemetry {
val spanExporter = OtlpHttpSpanExporter.builder()
.setEndpoint(System.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
.addHeader("Authorization", "Bearer ${System.getenv("OTEL_AUTH_TOKEN")}")
.build()
val tracerProvider = SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build())
.setResource(Resource.create(Attributes.of(AttributeKey.stringKey("service.name"), serviceName)))
.build()
return OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.buildAndRegisterGlobal()
}
Python Instrumentation Setup
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
def setup_telemetry(): provider = TracerProvider() otlp_exporter = OTLPSpanExporter( endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), headers={"Authorization": f"Bearer {os.getenv('OTEL_AUTH_TOKEN')}"} ) provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) trace.set_tracer_provider(provider)
OpenAIInstrumentor().instrument()
return trace.get_tracer(__name__)
**Go OTel Integration**
```go
package main
import (
"context"
"os"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
exporter, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
otlptracehttp.WithHeaders(map[string]string{
"Authorization": "Bearer " + os.Getenv("OTEL_AUTH_TOKEN"),
}),
)
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("ai-agent-service"),
)),
)
otel.SetTracerProvider(tp)
return tp, nil
}
Step 3: Deploy LLM-Aware Receiver
Standard OTLP collectors (OpenTelemetry Collector, Jaeger, Tempo) will ingest the spans but won't interpret gen_ai.* attributes meaningfully. Deploy a receiver that understands:
gen_ai.client.chatandgen_ai.client.completionfor model routinggen_ai.tool.executefor function calling visibilitygen_ai.usage.input_tokensandgen_ai.usage.output_tokensfor cost calculationgen_ai.response.finish_reasonfor failure classification
Step 4: Ensure Trace Context Propagation
AI agents frequently fan out across multiple services, message queues, or async workers. Propagate traceparent headers across HTTP calls and embed tracestate segments for cross-agent identity. Use OpenTelemetry Baggage to carry user/session identifiers without polluting span attributes.
Architecture Rationale
- OTLP over Vendor SDKs: Decouples instrumentation from consumption. Frameworks own signal generation; backends own interpretation.
- HTTP/protobuf Exporter: Simplifies network configuration. Most corporate firewalls allow outbound HTTPS, whereas gRPC requires explicit port allowances.
- 100% Sampling During Development: AI loops and tool-calling failures require full context. Production environments should transition to tail-based sampling to control storage costs.
- Server-Side Cost Mapping: Token counts are framework-agnostic. Pricing tables change frequently. Compute costs at the collector level using dynamic vendor rate cards, not hardcoded in application logic.
Pitfall Guide
1. Under-Sampling AI Traces
Explanation: Applying default 10% sampling to AI workloads destroys debugging capability. A single hallucination or tool loop might occur in the unsampled 90%.
Fix: Use probability=1.0 in staging. In production, implement tail-based sampling that retains traces containing gen_ai.response.finish_reason=error or high token counts.
2. Leaking PII in Prompt Attributes
Explanation: gen_ai.prompt and gen_ai.completion attributes often contain user data, API keys, or internal documentation. Exporting them to observability backends violates GDPR/CCPA and creates compliance risk.
Fix: Implement a SpanProcessor that hashes or redacts sensitive fields before export. Use regex patterns to detect and mask emails, SSNs, or credential formats.
3. Mixing Vendor SDKs with Native OTel
Explanation: Teams often keep a legacy observability SDK installed while enabling native framework telemetry. This creates duplicate spans, inflated costs, and conflicting trace IDs.
Fix: Audit pom.xml, build.gradle, requirements.txt, and go.mod. Remove proprietary instrumentation libraries. Rely exclusively on framework-native gen_ai.* emission.
4. Breaking Trace Context Across Async Boundaries
Explanation: Agents frequently delegate work to message queues (Kafka, SQS) or async task runners. If traceparent isn't propagated, the agent workflow fragments into disconnected spans.
Fix: Inject traceparent into message headers. Use OpenTelemetry context propagation libraries to extract and inject context at producer/consumer boundaries.
5. Assuming Token Counts Equal Cost
Explanation: gen_ai.usage.input_tokens and output_tokens are raw counts. Pricing varies by model, region, and tier. Hardcoding rates in application code creates stale data and billing discrepancies.
Fix: Map tokens to costs at the collector/backend level. Maintain a versioned pricing registry that updates automatically when vendors adjust rates.
6. Misconfigured OTLP Authentication
Explanation: Hardcoding bearer tokens or API keys in configuration files exposes credentials in version control and container images. Fix: Use environment variables, secret managers (HashiCorp Vault, AWS Secrets Manager), or workload identity federation. Rotate credentials automatically.
7. Ignoring Tool-Call Span Hierarchy
Explanation: Frameworks emit gen_ai.tool.execute as child spans of gen_ai.client.chat. If parent-child relationships are broken, you lose visibility into which tool was called during which model decision.
Fix: Verify span hierarchy in your backend. Ensure parent_span_id is correctly set during tool invocation. Use span links for cross-agent calls instead of forcing parent-child relationships.
Production Bundle
Action Checklist
- Verify framework version supports native
gen_ai.*OTel emission - Remove proprietary observability SDKs from dependency manifests
- Configure standard OTLP exporter with HTTP/protobuf endpoint
- Implement PII redaction processor before span export
- Propagate
traceparentacross all async and HTTP boundaries - Deploy LLM-aware collector/backend with token-to-cost mapping
- Enable tail-based sampling for production environments
- Validate span hierarchy and attribute completeness in staging
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Single framework, strict compliance | Native OTel + LLM-aware backend | Eliminates SDK debt, satisfies audit requirements | Low (infrastructure only) |
| Multi-framework polyglot stack | Standard OTLP + unified collector | Prevents SDK sprawl, normalizes telemetry across languages | Medium (collector scaling) |
| High-volume production (>10k req/min) | Tail-based sampling + gRPC exporter | Reduces storage costs while preserving error context | Low (savings on ingestion) |
| On-prem / air-gapped deployment | Self-hosted OTel Collector + local backend | Maintains data sovereignty, avoids cloud egress fees | High (infrastructure ownership) |
Configuration Template
# otel-collector-config.yaml
receivers:
otlp:
protocols:
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
timeout: 5s
send_batch_max_size: 1000
redact:
patterns:
- "gen_ai.prompt": "REDACTED"
- "gen_ai.completion": "REDACTED"
attributes:
- "user.email"
- "payment.card_number"
exporters:
otlp/llm-backend:
endpoint: "${LLM_OBSERVABILITY_ENDPOINT}"
headers:
authorization: "Bearer ${LLM_OBSERVABILITY_TOKEN}"
tls:
insecure: false
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, redact]
exporters: [otlp/llm-backend]
Quick Start Guide
- Upgrade Frameworks: Ensure your AI framework is on a version that natively emits
gen_ai.*spans (Spring AI 1.0+, LangChain4j 0.35+, Koog 0.8+, OpenLLMetry 0.3+). - Set Environment Variables: Export
OTEL_EXPORTER_OTLP_ENDPOINTandOTEL_AUTH_TOKENto your runtime environment. - Initialize Exporter: Add standard OTLP exporter configuration to your application. Remove any proprietary observability SDKs.
- Validate Spans: Run a test agent workflow. Query your LLM-aware backend for
gen_ai.client.chatspans. Verify token counts, tool calls, and finish reasons are present. - Enable Sampling Policy: Configure tail-based sampling rules to retain error traces and high-cost requests. Deploy to production.
