tool-loop-guard-rs: Break Agent Loops Before They Drain Your Budget

By Codcompass Team·2026-05-26·7 min read

Engineering Resilience: Deterministic Loop Detection for Autonomous Tool-Using Agents

Current Situation Analysis

Autonomous agents built on Large Language Models (LLMs) introduce a fundamental control flow mismatch. The LLM acts as a probabilistic reasoning engine with a sliding context window, while the tools it invokes are deterministic, stateless functions. When an agent's exit condition relies on receiving new information from a tool, but the tool returns identical data, the agent can enter a recursive state where it repeatedly requests the same resource, hoping for a different outcome.

This pattern is frequently overlooked during development because testing often focuses on the "happy path" where tools return diverse results. However, in production, external APIs can return cached data, rate-limit responses, or provide static content that fails to advance the agent's reasoning. Without explicit safeguards, the agent continues to consume tokens, extend latency, and burn API quota until a hard limit terminates the session.

Production telemetry reveals the severity of unguarded loops. In a documented incident involving a research agent, the system invoked a data retrieval tool 47 consecutive times with identical parameters. The session consumed $2.00 in compute costs for a task budgeted at $0.15, representing a 13x cost inflation. The user experienced an 8-minute latency spike before the session crashed due to token limits. This is not merely a cost issue; it is a reliability failure that degrades user trust and exhausts rate-limited external services.

WOW Moment: Key Findings

Implementing a deterministic loop guard transforms a catastrophic failure mode into a controlled, recoverable event. The guard introduces negligible overhead while capping the blast radius of logic errors.

Strategy	Max Cost per Incident	Latency Spike	Token Efficiency	Recovery Time
Unprotected Agent	$2.00+	8m+	13x overhead	Session Crash
Guarded Agent (N=3, W=10)	$0.18	<5s	1.2x overhead	Immediate Fallback

Why this matters: The guarded approach detects the recursion after three identical calls, aborting the loop before significant resources are wasted. The agent can immediately inject a recovery message into the context, allowing the LLM to pivot to an alternative strategy. This shifts the failure mode from "silent resource drain" to "explicit state recovery," enabling agents to self-correct rather than crash.

Core Solution

The solution requires a stateful sentinel that monitors tool invocations within a sliding window. The sentinel must identify exact repetitions of tool name and arguments, regardless of JSON key ordering, and enforce a threshold-based block.

Implementation Architecture

We implement a ToolRecursionSentinel that uses a fixed-size ring buffer to track recent calls. Each entry stores a SHA-256 hash of the canonicalized tool name and arguments. This design ensures constant memory usage and fast comparison.

Key Design Decisions:

Canonical JSON Serialization: LLMs may output JSON objects with keys in varying orders across turns (e.g., {"query": "A", "limit": 10} vs {"limit": 10, "query": "A"}). The sentinel must normalize these to a canonical form before hashing to ensure logical e

quivalence is detected. 2. SHA-256 Hashing: Storing full argument objects in the buffer is memory-inefficient for large payloads. Hashing reduces each entry to 32 bytes. A window of 100 calls consumes only 3.2 KB of RAM. 3. Linear Scan vs. HashMap: For typical window sizes (10–50 calls), a linear scan over the ring buffer is faster than a HashMap. The small size ensures cache locality, and the overhead of hash map allocation and collision handling outweighs the benefits of O(1) lookup. 4. History Integrity: When a loop is detected, the sentinel must not record the blocked call in the history. Recording it would artificially inflate the count and potentially mask legitimate subsequent calls if the agent retries with modified arguments.

Code Example

The following TypeScript implementation demonstrates the sentinel pattern. This example uses distinct naming and structure from reference implementations to illustrate the architecture.

import { createHash } from 'crypto';

interface SentinelConfig {
  threshold: number;
  windowSize: number;
}

interface SentinelEntry {
  hash: string;
  toolName: string;
}

class ToolRecursionSentinel {
  private buffer: SentinelEntry[];
  private head: number = 0;
  private count: number = 0;
  private config: SentinelConfig;

  constructor(config: SentinelConfig) {
    this.config = config;
    this.buffer = new Array<SentinelEntry>(config.windowSize);
  }

  /**
   * Evaluates a tool call against the recursion history.
   * Returns true if the call is safe to proceed.
   * Returns false if the recursion threshold is exceeded.
   */
  evaluate(toolName: string, args: Record<string, unknown>): boolean {
    const canonicalArgs = this.toCanonicalJSON(args);
    const hash = this.computeHash(toolName, canonicalArgs);

    // Check for threshold violation
    let matchCount = 0;
    for (let i = 0; i < this.count; i++) {
      const index = (this.head - 1 - i + this.buffer.length) % this.buffer.length;
      if (this.buffer[index].hash === hash) {
        matchCount++;
        if (matchCount >= this.config.threshold) {
          return false; // Loop detected
        }
      }
    }

    // Record the call
    const entry: SentinelEntry = { hash, toolName };
    this.buffer[this.head] = entry;
    this.head = (this.head + 1) % this.buffer.length;
    if (this.count < this.config.windowSize) {
      this.count++;
    }

    return true;
  }

  /**
   * Resets the sentinel state. Essential for session boundaries.
   */
  reset(): void {
    this.head = 0;
    this.count = 0;
    this.buffer = new Array<SentinelEntry>(this.config.windowSize);
  }

  private toCanonicalJSON(obj: Record<string, unknown>): string {
    const sortedKeys = Object.keys(obj).sort();
    const sortedObj: Record<string, unknown> = {};
    for (const key of sortedKeys) {
      sortedObj[key] = obj[key];
    }
    return JSON.stringify(sortedObj);
  }

  private computeHash(toolName: string, args: string): string {
    const payload = `${toolName}::${args}`;
    return createHash('sha256').update(payload).digest('hex');
  }
}

Integration Pattern

Integrate the sentinel into the agent's dispatch loop. The sentinel must be checked before tool execution.

const sentinel = new ToolRecursionSentinel({ threshold: 3, windowSize: 10 });

async function runAgentLoop(agent: Agent) {
  while (agent.isActive()) {
    const action = await agent.planNextStep();
    
    if (!sentinel.evaluate(action.tool, action.args)) {
      // Loop detected: Inject recovery feedback and break
      await agent.injectFeedback(
        "Warning: Repeated action detected. Switching strategy."
      );
      break;
    }

    const result = await executeTool(action);
    await agent.consumeResult(result);
  }
}

Pitfall Guide

1. The Polling Trap

Explanation: Guarding tools that are designed to be called repeatedly, such as get_job_status or poll_webhook, causes false positives. These tools legitimately return identical results until a state change occurs. Fix: Maintain an allowlist of polling tools that bypass the sentinel, or configure a separate sentinel with a much higher threshold for specific tool categories.

2. Session Bleed

Explanation: Failing to reset the sentinel between user turns or conversation sessions causes history from previous interactions to trigger false loops. Fix: Call sentinel.reset() immediately upon receiving a new user message or starting a new session. Ensure the sentinel lifecycle is bound to the session scope.

3. Semantic Blindness

Explanation: The sentinel detects exact argument matches only. It cannot identify semantically similar calls, such as search("apple") vs search("fruit"). If the agent varies arguments slightly to avoid detection, the sentinel will not block the loop. Fix: Accept this limitation as a trade-off for performance. For semantic deduplication, implement a result cache with embedding-based similarity upstream of the sentinel. The sentinel handles exact loops; the cache handles semantic redundancy.

4. Thread Contention

Explanation: The sentinel is not thread-safe. In environments with parallel tool dispatch, concurrent calls may corrupt the ring buffer or produce inconsistent counts. Fix: Wrap the sentinel in Arc<Mutex<Sentinel>> for shared access, or instantiate a separate sentinel per dispatch thread. For parallel execution, consider a distributed detection strategy that aggregates counts across threads.

5. Aggressive Thresholds

Explanation: Setting threshold: 1 blocks any repeated call, including legitimate retries after transient errors. This can cause the agent to fail on recoverable failures. Fix: Set threshold >= 3 to allow for retry logic. The threshold should reflect the maximum number of identical calls expected before a loop is suspected.

6. Silent Aborts

Explanation: Breaking the loop without providing feedback to the LLM leaves the agent in an undefined state. The LLM may continue reasoning based on stale context, leading to further errors. Fix: Always inject a descriptive message into the conversation history when a loop is detected. This informs the LLM of the constraint and encourages it to select an alternative tool or strategy.

7. Memory Leaks in History

Explanation: Using an unbounded list instead of a ring buffer causes memory usage to grow linearly with the number of turns. In long-running batch agents, this can lead to OOM errors. Fix: Ensure the implementation uses a fixed-size ring buffer with O(1) eviction. The memory footprint should be strictly bounded by windowSize * 32 bytes.

Production Bundle

Action Checklist

Initialize Sentinel: Create a ToolRecursionSentinel instance with tuned threshold and windowSize parameters based on agent behavior analysis.
Session Binding: Implement reset() calls on session boundaries to prevent history bleed between user turns.
Tool Allowlist: Identify polling tools and idempotent operations; exclude them from the sentinel or assign elevated thresholds.
Recovery Strategy: Define a standard recovery message to inject when a loop is detected, ensuring the LLM can pivot gracefully.
Telemetry Integration: Instrument evaluate() failures to log loop events. Monitor frequency to adjust thresholds and identify problematic agent behaviors.
Thread Safety Review: Audit the dispatch loop for concurrency. Wrap the sentinel in synchronization primitives if parallel tool execution is used.
Fallback Handling: Ensure the agent has a fallback path when a loop is blocked, such as returning a degraded response or escalating to a human.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Exact Argument Repetition	Loop Sentinel	Detects identical tool+args patterns with minimal overhead.	Prevents 13x cost inflation.
Semantic Redundancy	Result Cache	Caches results for similar queries using embeddings.	Reduces API calls by ~30%.
Provider Outage	Circuit Breaker	Stops calls on consistent 5xx errors.	Prevents wasted retries during downtime.
Runaway Token Usage	Budget Guard	Halts execution when token/cost limits are reached.	Caps maximum session cost.
Long-Running Batch	Per-Item Sentinel	Isolates loops to individual batch items.	Prevents one loop from blocking the entire job.

Configuration Template

For production systems, per-tool thresholds provide finer control. The following configuration pattern allows dynamic threshold assignment:

# agent-sentinel-config.yaml
sentinel:
  default:
    threshold: 3
    window_size: 10
  
  overrides:
    search_web:
      threshold: 2
      window_size: 5
      reason: "Search should converge quickly; tight limit."
    
    poll_job_status:
      threshold: 20
      window_size: 50
      reason: "Polling requires repeated calls; loose limit."
      
    get_user_profile:
      threshold: 5
      window_size: 15
      reason: "Moderate repetition allowed for caching."

Implement a wrapper that loads this configuration and routes calls to the appropriate sentinel instance. This pattern addresses the limitation of single-threshold guards and enables granular control over agent behavior.

Quick Start Guide

Add Dependency: Include the sentinel library in your project dependencies.
Instantiate: Create a sentinel with threshold: 3 and windowSize: 10 as a starting point.
Wrap Dispatch: Insert sentinel.evaluate(tool, args) before every tool execution.
Handle Failure: On false return, inject a recovery message and break the loop.
Reset: Call sentinel.reset() at the start of each new user session.

By integrating deterministic loop detection, you transform agent reliability from a probabilistic hope into an engineered guarantee. The sentinel acts as a circuit breaker for logic errors, ensuring that agents remain responsive, cost-effective, and recoverable in production environments.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back