Setting Up Agent Observability in 30 Minutes

By Codcompass Team·2026-05-26·9 min read

Autonomous Agent Telemetry: A Modular Observability Stack for Production Diagnostics

Current Situation Analysis

Autonomous agents introduce a fundamental debugging challenge: non-determinism combined with opacity. When a traditional microservice fails, you get a stack trace and an HTTP status code. When an agent fails, the user often receives a generic "it didn't work" response, or worse, a plausible but incorrect answer. The failure mode is silent because the agent's internal state—tool selection, reasoning chains, and cost accumulation—is invisible to the operator.

This problem is frequently overlooked during development because engineering teams prioritize prompt engineering and tool definition over runtime telemetry. Agents are often treated as extensions of the LLM API rather than complex software systems with state, loops, and side effects. Consequently, production debugging relies on reproducing the user's input and hoping the error surfaces, which is inefficient and often impossible for intermittent failures.

The barrier to observability is perceived as high, requiring complex distributed tracing infrastructure. However, practical evidence shows that a robust diagnostic layer can be implemented in under 30 minutes using modular, file-based telemetry. By composing four distinct observability primitives, teams can transform a black-box agent into a fully auditable system with minimal code overhead.

WOW Moment: Key Findings

The following comparison illustrates the efficiency gain of a modular telemetry stack versus traditional approaches. The modular approach delivers near-APM fidelity for agent-specific metrics without the operational burden of full distributed tracing.

Approach	Mean Time to Resolution (MTTR)	Cost Attribution	Tool Call Fidelity	Implementation Effort
Console Logging	High (Manual grep required)	None	Low (String dumps)	Low
Full APM Integration	Low (Rich dashboards)	High	High	High (Agent/Collector setup)
Modular Telemetry Stack	Medium-Low	High	High	Low (30 mins)

Why this matters: The modular stack bridges the gap between ad-hoc debugging and enterprise observability. It provides structured data for tool calls, precise cost tracking per session, and decoupled event distribution, enabling rapid iteration without waiting for infrastructure provisioning.

Core Solution

The solution is a four-pillar architecture. Each pillar addresses a specific blind spot in agent execution. The components are designed to be composable; you can deploy them incrementally based on immediate needs.

Pillar 1: Tool Instrumentation

Agents interact with the world via tools. Failures often occur at the tool boundary (bad args, malformed responses, timeouts). This pillar captures the exact invocation details for every tool call.

Implementation: Use a higher-order function to wrap tool implementations. This avoids decorator compatibility issues across TypeScript versions and provides explicit control over instrumentation.

import { createWriteStream } from 'fs';
import { appendFileSync } from 'fs';

interface ToolRecord {
  ts: string;
  toolName: string;
  args: unknown;
  resultPreview: string;
  latencyMs: number;
  status: 'success' | 'error';
}

class ToolRecorder {
  private stream: ReturnType<typeof createWriteStream>;

  constructor(config: { storePath: string }) {
    this.stream = createWriteStream(config.storePath, { flags: 'a' });
  }

  instrument<T extends (...args: any[]) => Promise<any>>(
    toolName: string,
    fn: T
  ): T {
    return (async (...args: any[]) => {
      const start = Date.now();
      let status: 'success' | 'error' = 'success';
      let resultPreview = '';

      try {
        const result = await fn(...args);
        resultPreview = JSON.stringify(result).slice(0, 200);
        return result;
      } catch (err) {
        status = 'error';
        resultPreview = String(err);
        throw err;
      } finally {
        const record: ToolRecord = {
          ts:

new Date().toISOString(), toolName, args: args[0], // Assuming single arg object for brevity resultPreview, latencyMs: Date.now() - start, status, }; this.stream.write(JSON.stringify(record) + '\n'); } }) as T; } }


#### Pillar 2: Session Profiling
Cost and latency are aggregate metrics. They must be accumulated across multiple LLM calls within a single agent session. This pillar tracks token usage, estimated cost, and wall-clock duration per run.

**Implementation:** A session manager that accumulates metrics and flushes a summary upon completion.

```typescript
interface SessionSpan {
  recordTokens(input: number, output: number, cost: number): void;
  close(): void;
}

class SessionProfiler {
  private records: any[] = [];

  startSession(sessionId: string): SessionSpan {
    const startTime = Date.now();
    let totalInput = 0;
    let totalOutput = 0;
    let totalCost = 0;

    return {
      recordTokens: (input: number, output: number, cost: number) => {
        totalInput += input;
        totalOutput += output;
        totalCost += cost;
      },
      close: () => {
        const duration = Date.now() - startTime;
        const summary = {
          sessionId,
          durationMs: duration,
          totalInputTokens: totalInput,
          totalOutputTokens: totalOutput,
          totalCostUsd: totalCost,
          timestamp: new Date().toISOString(),
        };
        this.records.push(summary);
        // In production, emit to file or external sink here
        console.log(`[Profiler] Session ${sessionId} complete. Cost: $${totalCost.toFixed(4)}`);
      },
    };
  }
}

Pillar 3: Event Distribution

Monitoring systems require events, not logs. This pillar provides an in-process pub/sub mechanism to decouple the agent loop from external subscribers like metrics collectors or alerting services.

Implementation: A typed event emitter with support for asynchronous dispatch to prevent blocking the agent loop.

type EventHandler = (payload: any) => void;

class EventDispatcher {
  private listeners: Map<string, Set<EventHandler>> = new Map();
  private asyncMode: boolean;

  constructor(config: { asyncDispatch?: boolean } = {}) {
    this.asyncMode = config.asyncDispatch ?? false;
  }

  subscribe(event: string, handler: EventHandler): void {
    if (!this.listeners.has(event)) {
      this.listeners.set(event, new Set());
    }
    this.listeners.get(event)!.add(handler);
  }

  emit(event: string, payload: any): void {
    const handlers = this.listeners.get(event);
    if (!handlers) return;

    const dispatch = (h: EventHandler) => {
      if (this.asyncMode) {
        setImmediate(() => h(payload));
      } else {
        h(payload);
      }
    };

    handlers.forEach(dispatch);
  }
}

Pillar 4: Reasoning Audit

Understanding what the agent did is insufficient; you need to know why. This pillar records decision points, tool selections, and model reasoning blocks to create an auditable trail of the agent's logic.

Implementation: A ledger that logs structured decisions at each step of the agent loop.

interface DecisionEntry {
  sessionId: string;
  step: number;
  type: 'tool_selection' | 'final_response' | 'reasoning';
  detail: string;
  metadata?: Record<string, any>;
}

class AuditLedger {
  private entries: DecisionEntry[] = [];

  log(sessionId: string, entry: Omit<DecisionEntry, 'sessionId'>): void {
    const fullEntry: DecisionEntry = { sessionId, ...entry };
    this.entries.push(fullEntry);
    // Append to JSONL file in production
  }

  getTrace(sessionId: string): DecisionEntry[] {
    return this.entries.filter(e => e.sessionId === sessionId);
  }
}

Composition: The Agent Loop

The following example demonstrates how the four pillars compose within a standard agent loop using the Anthropic API.

import Anthropic from '@anthropic-ai/sdk';
import { v4 as uuidv4 } from 'uuid';

// Initialize components
const toolRecorder = new ToolRecorder({ storePath: './logs/tools.jsonl' });
const profiler = new SessionProfiler();
const dispatcher = new EventDispatcher({ asyncDispatch: true });
const ledger = new AuditLedger();

// Instrumented tools
const searchWeb = toolRecorder.instrument('search_web', async (query: string) => {
  // Mock implementation
  return `Results for ${query}`;
});

const tools = [
  {
    name: 'search_web',
    description: 'Search the web',
    input_schema: { type: 'object', properties: { query: { type: 'string' } } },
  },
];

async function runAgent(userInput: string): Promise<string> {
  const sessionId = uuidv4();
  const client = new Anthropic();
  const span = profiler.startSession(sessionId);
  
  let messages: Anthropic.MessageParam[] = [{ role: 'user', content: userInput }];
  let step = 0;
  let totalCost = 0;

  try {
    while (true) {
      const response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 4096,
        tools,
        messages,
      });

      // Record tokens and cost
      const cost = (response.usage.input_tokens * 0.003 + response.usage.output_tokens * 0.015) / 1000;
      totalCost += cost;
      span.recordTokens(response.usage.input_tokens, response.usage.output_tokens, cost);

      // Audit reasoning
      ledger.log(sessionId, {
        step,
        type: 'reasoning',
        detail: 'Model generated response',
        metadata: { stopReason: response.stop_reason },
      });

      if (response.stop_reason === 'tool_use') {
        const toolResults: Anthropic.ToolResultBlockParam[] = [];
        
        for (const block of response.content) {
          if (block.type === 'tool_use') {
            // Audit decision
            ledger.log(sessionId, {
              step,
              type: 'tool_selection',
              detail: `Selected ${block.name}`,
              metadata: { args: block.input },
            });

            // Emit event
            dispatcher.emit('agent.tool_invoked', {
              sessionId,
              tool: block.name,
              args: block.input,
            });

            // Execute tool
            const result = await searchWeb(block.input.query);
            toolResults.push({
              type: 'tool_result',
              tool_use_id: block.id,
              content: result,
            });
          }
        }

        messages.push(response.content);
        messages.push({ role: 'user', content: toolResults });
        step++;
      } else {
        // Final response
        ledger.log(sessionId, {
          step,
          type: 'final_response',
          detail: 'Agent completed task',
        });

        dispatcher.emit('agent.session_complete', {
          sessionId,
          cost: totalCost,
          steps: step,
        });

        return response.content.find(b => b.type === 'text')?.text || '';
      }
    }
  } finally {
    span.close();
  }
}

Pitfall Guide

Synchronous Event Blocking
- Explanation: If the event dispatcher runs synchronously and a subscriber performs a slow I/O operation (e.g., writing to a remote database), the agent loop blocks, increasing latency for the end user.
- Fix: Always enable asynchronous dispatch (asyncDispatch: true) for production event buses. Use setImmediate or background threads to ensure subscribers never delay the agent.
Token Accumulation Drift
- Explanation: Agents make multiple LLM calls per session. If token counts are not accumulated correctly across turns, cost reporting will be inaccurate, leading to budget overruns.
- Fix: Use a span or context manager that maintains an accumulator. Ensure every LLM response updates the span before the next iteration.
PII Leakage in Snapshots
- Explanation: Tool instrumentation captures arguments verbatim. If tools receive sensitive user data (emails, PII), this data is persisted in logs, creating compliance risks.
- Fix: Implement a redaction middleware in the ToolRecorder that scrubs sensitive fields before writing to the store. Use allowlists for logged fields.
Decision Log vs. Ground Truth
- Explanation: The audit ledger records the agent's actions and metadata, but this is not a ground-truth explanation of the model's internal state. Without extended thinking, the "why" is inferred, not observed.
- Fix: Enable extended thinking modes in the LLM provider and capture thinking blocks in the audit ledger. Treat the decision log as a proxy for reasoning unless thinking blocks are available.
Unbounded File Growth
- Explanation: JSONL logs append indefinitely. In high-volume production, log files can consume disk space rapidly, causing storage exhaustion.
- Fix: Implement log rotation based on size or time. Compress rotated files and set up a retention policy to purge data older than the required audit window.
Context Loss in Events
- Explanation: Events emitted without a session identifier cannot be correlated with specific runs, making debugging impossible when multiple agents run concurrently.
- Fix: Enforce a schema for events that requires sessionId and timestamp. Validate payloads at the emit boundary.
Ignoring Tool Latency Spikes
- Explanation: Focusing only on LLM latency can mask performance issues in tool integrations. A slow database query can degrade the entire agent experience.
- Fix: Monitor tool latency metrics separately. Set alerts for tools exceeding p95 latency thresholds.

Production Bundle

Action Checklist

Initialize Telemetry Components: Instantiate ToolRecorder, SessionProfiler, EventDispatcher, and AuditLedger with appropriate storage paths.
Instrument Tools: Wrap all tool functions with toolRecorder.instrument() to capture invocation data.
Wrap Agent Loop: Integrate profiler.startSession() and span.close() around the main execution loop.
Configure Event Subscribers: Register handlers for agent.tool_invoked and agent.session_complete to push metrics to monitoring systems.
Enable Async Dispatch: Set asyncDispatch: true on the event dispatcher to prevent blocking.
Implement Log Rotation: Configure file rotation for JSONL stores to manage disk usage.
Add PII Redaction: Apply redaction rules to tool arguments before logging.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local Development	File-based JSONL + Sync Events	Simplicity; no external dependencies required.	None
Production Alerting	Async Event Bus + External Sink	Non-blocking; enables real-time alerts via Slack/PagerDuty.	Low (Network egress)
High-Volume Agent	Batched File Writes + Rotation	Reduces I/O overhead; prevents file bloat.	Low (Storage management)
Compliance Audit	Extended Thinking + Audit Ledger	Captures model reasoning for regulatory review.	Medium (Higher token cost)

Configuration Template

// telemetry.config.ts
export const telemetryConfig = {
  storage: {
    toolsPath: './logs/tools.jsonl',
    tracesPath: './logs/traces.jsonl',
    decisionsPath: './logs/decisions.jsonl',
    rotation: {
      maxSizeMB: 100,
      maxFiles: 10,
    },
  },
  eventBus: {
    asyncDispatch: true,
    subscribers: {
      'agent.session_complete': ['metricsCollector', 'slackNotifier'],
    },
  },
  redaction: {
    enabled: true,
    patterns: ['email', 'phone', 'ssn'],
  },
};

Quick Start Guide

Install Dependencies:
```
npm install @anthropic-ai/sdk uuid
```
Create Telemetry Module: Copy the ToolRecorder, SessionProfiler, EventDispatcher, and AuditLedger classes into a telemetry/ directory.
Wrap Your Tools: Replace raw tool functions with instrumented versions using toolRecorder.instrument().
Integrate Agent Loop: Add session profiling and event emission to your agent's main loop as shown in the composition example.
Run and Verify: Execute the agent and inspect the generated JSONL files in the logs/ directory to confirm data capture.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back