Back to KB
Difficulty
Intermediate
Read Time
11 min

Monitoring LLM API Calls in Python: Latency, Token Usage, and Cost Tracking With OpenTelemetry

By Codcompass TeamΒ·Β·11 min read

LLM API calls are unlike any other external dependency in your Python application.

A database query takes milliseconds. A Redis call takes microseconds. An LLM call takes anywhere from half a second to thirty seconds, consumes a variable number of tokens on every invocation, costs real money on every request, and can fail in ways that have nothing to do with network connectivity β€” token limits, content filters, model refusals, context window exhaustion.

Standard application monitoring was not built for this. Your existing latency dashboards will show LLM calls as outliers. Your error rate alerts will fire on model refusals that aren't actually errors. Your cost monitoring won't exist at all unless you build it.

This article builds it. We'll instrument LLM API calls in Python with OpenTelemetry β€” capturing latency, token consumption, estimated cost, and finish reasons as structured telemetry that you can query, dashboard, and alert on.


The Monitoring Gap in LLM Applications

When you add an LLM to a Python application, you typically get visibility into two things: whether the call succeeded, and how long it took. Everything else β€” how many tokens it consumed, what the model decided to do, how much it cost, whether it hit a limit β€” is invisible unless you instrument it explicitly.

This creates real operational problems:

  • A feature that works in testing starts timing out in production because prompts grew longer than expected and token counts climbed
  • Costs spike unexpectedly because one endpoint is generating unusually long completions
  • Users report bad responses but you can't tell whether the model refused, truncated, or hallucinated because finish_reason is never captured
  • You can't tell which of your ten LLM-powered features is responsible for 80% of your API spend

Structured telemetry on LLM calls fixes all of these. Let's build it.


Prerequisites

  • Python 3.10+
  • An OpenAI or Anthropic API key
  • A running OpenTelemetry Collector or observability backend

Installing Dependencies

pip install opentelemetry-sdk
pip install opentelemetry-api
pip install opentelemetry-exporter-otlp-proto-grpc
pip install openai
pip install anthropic
pip install fastapi uvicorn

Enter fullscreen mode Exit fullscreen mode


Project Structure

llm-monitoring/
β”œβ”€β”€ tracing.py          # OpenTelemetry setup
β”œβ”€β”€ llm_tracer.py       # LLM instrumentation layer
β”œβ”€β”€ cost_estimator.py   # Token cost calculation
β”œβ”€β”€ main.py             # FastAPI application
└── services.py         # LLM-powered features

Enter fullscreen mode Exit fullscreen mode


Step 1: OpenTelemetry Setup

tracing.py

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME


def init_tracer(service_name: str) -> trace.Tracer:
    resource = Resource.create({
        SERVICE_NAME: service_name,
        "service.version": "1.0.0",
    })

    exporter = OTLPSpanExporter(
        endpoint=os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT", "localhost:4317"),
        insecure=True,
    )

    provider = TracerProvider(resource=resource)
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

    return trace.get_tracer(service_name)

Enter fullscreen mode Exit fullscreen mode


Step 2: Cost Estimation

Before building the instrumentation layer, we need a way to estimate costs. LLM providers charge per token, with different rates for input and output tokens.

cost_estimator.py

from dataclasses import dataclass
from typing import Optional


@dataclass
class ModelPricing:
    input_cost_per_token: float   # USD per token
    output_cost_per_token: float  # USD per token


# Pricing as of early 2026 β€” verify against provider pricing pages
# before building cost dashboards on these numbers
MODEL_PRICING: dict[str, ModelPricing] = {
    # OpenAI
    "gpt-4o": ModelPricing(
        input_cost_per_token=0.000005,
        output_cost_per_token=0.000015,
    ),
    "gpt-4o-mini": ModelPricing(
        input_cost_per_token=0.00000015,
        output_cost_per_token=0.0000006,
    ),
    "gpt-3.5-turbo": ModelPricing(
        input_cost_per_token=0.0000005,
        output_cost_per_token=0.0000015,
    ),
    # Anthropic
    "claude-sonnet-4-6": ModelPricing(
        input_cost_per_token=0.000003,
        output_cost_per_token=0.000015,
    ),
    "claude-haiku-4-5": ModelPricing(
        input_cost_per_token=0.00000025,
        output_cost_per_token=0.00000125,
    ),
}


def estimate_cost(
    model: str,
    prompt_tokens: int,
    completion_tokens: int,
) -> Optional[float]:
    """
    Estimate the cost of an LLM call in USD.
    Returns None if the model is not in the pricing table.
    """
    pricing = MODEL_PRICING.get(model)
    if not pricing:
        return None

    input_cost = prompt_tokens * pricing.input_cost_per_token
    output_cost = completion_tokens * pricing.output_cost_per_token
    return round

πŸŽ‰ Mid-Year Sale β€” Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register β€” Start Free Trial

7-day free trial Β· Cancel anytime Β· 30-day money-back