Back to KB
Difficulty
Intermediate
Read Time
8 min

Your AI agent already emits OpenTelemetry. Why aren't you watching it?

By Codcompass Team··8 min read

Standardizing AI Agent Observability: From Vendor Lock-in to OpenTelemetry gen_ai.* Conventions

Current Situation Analysis

Generative AI agents operate on non-deterministic execution paths. Unlike traditional microservices that follow predictable request-response cycles, agents dynamically select models, construct prompts, invoke external tools, and retry on failure. Traditional observability stacks were never designed to capture this cognitive workflow.

When teams deploy LLM agents into production, they quickly encounter a visibility gap. Generic APM platforms track HTTP latency, error rates, and throughput, but they treat the AI layer as a black box. A POST /v1/chat span reveals nothing about which model was selected, how many tokens were consumed, which tools were invoked, or why the planner chose a specific action. The signal is either buried in raw request payloads or discarded entirely.

To bridge this gap, engineering teams historically reached for proprietary observability SDKs. These tools capture the right telemetry but introduce severe architectural debt. They couple your application to a specific vendor, require coordinated upgrades alongside framework releases, and multiply dependency footprints when teams run polyglot stacks. A single organization might use Spring AI for orchestration, LangChain4j for retrieval, and a Python framework for data preprocessing. Each vendor SDK demands its own initialization, configuration, and lifecycle management.

The industry is now shifting toward a standardized approach. The OpenTelemetry community finalized the gen_ai.* semantic conventions, and major AI frameworks have adopted them natively. Spring AI 1.0 emits telemetry via Micrometer Observations. LangChain4j exposes the same signals through its ChatModelListener API. Koog 0.8 includes a first-class OpenTelemetry feature for the JVM. Python's OpenLLMetry and OpenInference projects provide instrumentations for Anthropic, OpenAI, LangChain, and LlamaIndex. Go's otel-instrumentation-genai package follows the same pattern.

The telemetry is already on the wire in standard form. The bottleneck is no longer instrumentation; it's reception. Teams need an OTLP endpoint that understands gen_ai.* attributes, reconstructs agent workflows, and surfaces actionable insights without requiring application-level vendor dependencies.

WOW Moment: Key Findings

The transition from proprietary SDKs to standard OpenTelemetry conventions fundamentally changes how AI observability is architected. The table below compares the three dominant approaches currently in production.

ApproachSignal CoverageFramework CouplingImplementation EffortBackend Portability
Generic APM~15% (HTTP/infra only)NoneLowHigh
Proprietary Vendor SDK~90% (LLM-specific)HighHighLow
Standard OTel + LLM-Aware Backend~95% (Full semantic depth)ZeroLowHigh

This finding matters because it decouples telemetry generation from telemetry consumption. Frameworks handle signal emission through native contracts. The collector or backend handles interpretation, cost mapping, graph reconstruction, and policy enforcement. Engineering teams can swap backends, upgrade frameworks, or migrate cloud providers without touching application code. The observability layer becomes infrastructure, not application logic.

Core Solution

Implementing standardized AI agent observability requires four architectural steps: validating native emission, configuring the OTLP exporter, deploying an LLM-aware receiver, and ensuring trace context propagation across agent boundaries.

Step 1: Validate Native Framework Emission

Confirm your framework emits gen_ai.* spans. Modern versions of Spring AI, LangChain4j, Koog, and Python/Go instrumentations output these attributes automatically. No custom instrumentation code is required.

Step 2: Configure Standard OTLP Exporter

Replace proprietary SDK initialization with standard OpenTelemetry exporter configuration. Use HTTP/protobuf for firewall compatibility or gRPC for high-throughput environments.

Java/Spring Boot Configuration

# application.properties
management.tracing.export.otlp.url=${OTEL_COLLECTOR_ENDPOINT}/v1/traces
management.tracing.export.otlp.headers.authorization=Bearer ${OTEL_AUTH_TOKEN}
management.tracing.sampling.probability=1.0
spring.ai.otel.enabled=true

Kotlin/Koog Implementation

import io.opentelemetry.api.OpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.exporter.otlp.http.trace.OtlpHttpSpanExporter
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor

fun configureAgentTelemetry(serviceName: String): OpenTelemetry {
    val spanExporter = OtlpHttpSpanExporter.builder()
        .setEndpoint(System.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"))
        .addHeader("Authorization", "Bearer ${System.getenv("OTEL_AUTH_TOKEN")}")
        .build()

    val tracerProvider = SdkTracerProvider.builder()
        .addSpanProcessor(BatchSpanProcessor.builder(spanExporter).build())
        .setResource(Resource.create(Attributes.of(AttributeKey.stringKey("service.name"), serviceName)))
        .build()

    return OpenTelemetrySdk.builder()
        .setTracerProvider(tracerProvider)
        .buildAndRegisterGlobal()
}

Python Instrumentation Setup

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

from opentelemetry.instrumentation.openai import OpenAIInstrumentor

def setup_telemetry(): provider = TracerProvider() otlp_exporter = OTLPSpanExporter( endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), headers={"Authorization": f"Bearer {os.getenv('OTEL_AUTH_TOKEN')}"} ) provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) trace.set_tracer_provider(provider)

OpenAIInstrumentor().instrument()
return trace.get_tracer(__name__)

**Go OTel Integration**
```go
package main

import (
    "context"
    "os"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
    exporter, err := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
        otlptracehttp.WithHeaders(map[string]string{
            "Authorization": "Bearer " + os.Getenv("OTEL_AUTH_TOKEN"),
        }),
    )
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("ai-agent-service"),
        )),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}

Step 3: Deploy LLM-Aware Receiver

Standard OTLP collectors (OpenTelemetry Collector, Jaeger, Tempo) will ingest the spans but won't interpret gen_ai.* attributes meaningfully. Deploy a receiver that understands:

  • gen_ai.client.chat and gen_ai.client.completion for model routing
  • gen_ai.tool.execute for function calling visibility
  • gen_ai.usage.input_tokens and gen_ai.usage.output_tokens for cost calculation
  • gen_ai.response.finish_reason for failure classification

Step 4: Ensure Trace Context Propagation

AI agents frequently fan out across multiple services, message queues, or async workers. Propagate traceparent headers across HTTP calls and embed tracestate segments for cross-agent identity. Use OpenTelemetry Baggage to carry user/session identifiers without polluting span attributes.

Architecture Rationale

  • OTLP over Vendor SDKs: Decouples instrumentation from consumption. Frameworks own signal generation; backends own interpretation.
  • HTTP/protobuf Exporter: Simplifies network configuration. Most corporate firewalls allow outbound HTTPS, whereas gRPC requires explicit port allowances.
  • 100% Sampling During Development: AI loops and tool-calling failures require full context. Production environments should transition to tail-based sampling to control storage costs.
  • Server-Side Cost Mapping: Token counts are framework-agnostic. Pricing tables change frequently. Compute costs at the collector level using dynamic vendor rate cards, not hardcoded in application logic.

Pitfall Guide

1. Under-Sampling AI Traces

Explanation: Applying default 10% sampling to AI workloads destroys debugging capability. A single hallucination or tool loop might occur in the unsampled 90%. Fix: Use probability=1.0 in staging. In production, implement tail-based sampling that retains traces containing gen_ai.response.finish_reason=error or high token counts.

2. Leaking PII in Prompt Attributes

Explanation: gen_ai.prompt and gen_ai.completion attributes often contain user data, API keys, or internal documentation. Exporting them to observability backends violates GDPR/CCPA and creates compliance risk. Fix: Implement a SpanProcessor that hashes or redacts sensitive fields before export. Use regex patterns to detect and mask emails, SSNs, or credential formats.

3. Mixing Vendor SDKs with Native OTel

Explanation: Teams often keep a legacy observability SDK installed while enabling native framework telemetry. This creates duplicate spans, inflated costs, and conflicting trace IDs. Fix: Audit pom.xml, build.gradle, requirements.txt, and go.mod. Remove proprietary instrumentation libraries. Rely exclusively on framework-native gen_ai.* emission.

4. Breaking Trace Context Across Async Boundaries

Explanation: Agents frequently delegate work to message queues (Kafka, SQS) or async task runners. If traceparent isn't propagated, the agent workflow fragments into disconnected spans. Fix: Inject traceparent into message headers. Use OpenTelemetry context propagation libraries to extract and inject context at producer/consumer boundaries.

5. Assuming Token Counts Equal Cost

Explanation: gen_ai.usage.input_tokens and output_tokens are raw counts. Pricing varies by model, region, and tier. Hardcoding rates in application code creates stale data and billing discrepancies. Fix: Map tokens to costs at the collector/backend level. Maintain a versioned pricing registry that updates automatically when vendors adjust rates.

6. Misconfigured OTLP Authentication

Explanation: Hardcoding bearer tokens or API keys in configuration files exposes credentials in version control and container images. Fix: Use environment variables, secret managers (HashiCorp Vault, AWS Secrets Manager), or workload identity federation. Rotate credentials automatically.

7. Ignoring Tool-Call Span Hierarchy

Explanation: Frameworks emit gen_ai.tool.execute as child spans of gen_ai.client.chat. If parent-child relationships are broken, you lose visibility into which tool was called during which model decision. Fix: Verify span hierarchy in your backend. Ensure parent_span_id is correctly set during tool invocation. Use span links for cross-agent calls instead of forcing parent-child relationships.

Production Bundle

Action Checklist

  • Verify framework version supports native gen_ai.* OTel emission
  • Remove proprietary observability SDKs from dependency manifests
  • Configure standard OTLP exporter with HTTP/protobuf endpoint
  • Implement PII redaction processor before span export
  • Propagate traceparent across all async and HTTP boundaries
  • Deploy LLM-aware collector/backend with token-to-cost mapping
  • Enable tail-based sampling for production environments
  • Validate span hierarchy and attribute completeness in staging

Decision Matrix

ScenarioRecommended ApproachWhyCost Impact
Single framework, strict complianceNative OTel + LLM-aware backendEliminates SDK debt, satisfies audit requirementsLow (infrastructure only)
Multi-framework polyglot stackStandard OTLP + unified collectorPrevents SDK sprawl, normalizes telemetry across languagesMedium (collector scaling)
High-volume production (>10k req/min)Tail-based sampling + gRPC exporterReduces storage costs while preserving error contextLow (savings on ingestion)
On-prem / air-gapped deploymentSelf-hosted OTel Collector + local backendMaintains data sovereignty, avoids cloud egress feesHigh (infrastructure ownership)

Configuration Template

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4318"

processors:
  batch:
    timeout: 5s
    send_batch_max_size: 1000
  redact:
    patterns:
      - "gen_ai.prompt": "REDACTED"
      - "gen_ai.completion": "REDACTED"
    attributes:
      - "user.email"
      - "payment.card_number"

exporters:
  otlp/llm-backend:
    endpoint: "${LLM_OBSERVABILITY_ENDPOINT}"
    headers:
      authorization: "Bearer ${LLM_OBSERVABILITY_TOKEN}"
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, redact]
      exporters: [otlp/llm-backend]

Quick Start Guide

  1. Upgrade Frameworks: Ensure your AI framework is on a version that natively emits gen_ai.* spans (Spring AI 1.0+, LangChain4j 0.35+, Koog 0.8+, OpenLLMetry 0.3+).
  2. Set Environment Variables: Export OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_AUTH_TOKEN to your runtime environment.
  3. Initialize Exporter: Add standard OTLP exporter configuration to your application. Remove any proprietary observability SDKs.
  4. Validate Spans: Run a test agent workflow. Query your LLM-aware backend for gen_ai.client.chat spans. Verify token counts, tool calls, and finish reasons are present.
  5. Enable Sampling Policy: Configure tail-based sampling rules to retain error traces and high-cost requests. Deploy to production.