e → reasoning → next action. We intercept the cycle before tool execution and after tool response. This requires a hook registry that exposes beforeToolCall and afterToolResponse events.
Step 2: Build a Duplicate Call Filter
A sliding window tracks recent tool invocations. If the same tool with identical parameters appears twice within the window, the third attempt is blocked. This prevents immediate retry loops caused by transient ambiguity.
interface ToolCallSignature {
name: string;
fingerprint: string;
}
class DuplicateCallFilter {
private history: ToolCallSignature[] = [];
private windowSize: number;
private blockedCount: number = 0;
constructor(windowSize: number = 3) {
this.windowSize = windowSize;
}
evaluate(toolName: string, input: Record<string, unknown>): boolean {
const fingerprint = JSON.stringify(input);
const signature: ToolCallSignature = { name: toolName, fingerprint };
const recentWindow = this.history.slice(-this.windowSize);
const duplicateCount = recentWindow.filter(
(entry) => entry.name === toolName && entry.fingerprint === fingerprint
).length;
if (duplicateCount >= 2) {
this.blockedCount++;
return false; // Block execution
}
this.history.push(signature);
return true; // Allow execution
}
reset(): void {
this.history = [];
this.blockedCount = 0;
}
}
Architecture Rationale: We use a sliding window instead of a global counter to allow legitimate retries across different task phases. The fingerprint is generated via deterministic JSON serialization to ensure parameter order does not affect deduplication. The filter resets per invocation to prevent cross-task pollution.
Step 3: Enforce Deterministic Response Schemas
Tools must return structured terminal states. The agent's reasoning loop breaks when it receives unambiguous closure signals. We validate responses against a strict schema before passing them to the LLM.
type TerminalState = 'SUCCESS' | 'FAILED' | 'PENDING';
interface ToolResponseEnvelope {
state: TerminalState;
payload: unknown;
metadata?: { reason?: string; retryAfter?: number };
}
class ResponseStateValidator {
validate(rawOutput: string): ToolResponseEnvelope {
const successPattern = /^SUCCESS:\s*(.+)/;
const failedPattern = /^FAILED:\s*(.+)/;
if (successPattern.test(rawOutput)) {
return {
state: 'SUCCESS',
payload: rawOutput.replace(successPattern, '$1'),
};
}
if (failedPattern.test(rawOutput)) {
return {
state: 'FAILED',
payload: rawOutput.replace(failedPattern, '$1'),
metadata: { reason: 'Terminal failure detected' },
};
}
// Fallback for ambiguous outputs
return {
state: 'PENDING',
payload: rawOutput,
metadata: { reason: 'Non-terminal response detected' },
};
}
}
Architecture Rationale: Regex-based parsing ensures compatibility with legacy tools that return plain text. The PENDING state acts as a circuit breaker trigger. By normalizing all tool outputs into a unified envelope, the agent's decision engine can route based on state rather than parsing natural language.
Step 4: Apply Invocation Budgets
Hard limits prevent runaway loops when filters fail or tools return valid but non-terminal data. We track call counts per tool per invocation and enforce a ceiling.
class InvocationBudgetGuard {
private budgets: Record<string, number>;
private counters: Record<string, number> = {};
private lock: boolean = false;
constructor(budgets: Record<string, number>) {
this.budgets = budgets;
}
async acquire(toolName: string): Promise<{ allowed: boolean; message?: string }> {
while (this.lock) await new Promise((r) => setTimeout(r, 10));
this.lock = true;
const current = (this.counters[toolName] || 0) + 1;
this.counters[toolName] = current;
const limit = this.budgets[toolName];
this.lock = false;
if (limit && current > limit) {
return {
allowed: false,
message: `Budget exceeded for ${toolName}. Maximum allowed: ${limit}.`,
};
}
return { allowed: true };
}
reset(): void {
this.counters = {};
}
}
Architecture Rationale: The guard uses a lightweight mutex pattern to prevent race conditions in concurrent tool execution environments. Budgets are defined per-tool to allow flexible allocation (e.g., search tools get higher limits than booking tools). The reset method ensures budgets apply per task, not per agent lifetime.
Integration Architecture
The three components compose into a single termination guard:
class ReasoningLoopGuard {
private filter: DuplicateCallFilter;
private validator: ResponseStateValidator;
private budget: InvocationBudgetGuard;
constructor(config: { windowSize: number; budgets: Record<string, number> }) {
this.filter = new DuplicateCallFilter(config.windowSize);
this.validator = new ResponseStateValidator();
this.budget = new InvocationBudgetGuard(config.budgets);
}
async beforeToolCall(toolName: string, input: Record<string, unknown>) {
const duplicateAllowed = this.filter.evaluate(toolName, input);
if (!duplicateAllowed) {
throw new Error('BLOCKED: Duplicate call detected within evaluation window.');
}
const budgetCheck = await this.budget.acquire(toolName);
if (!budgetCheck.allowed) {
throw new Error(budgetCheck.message || 'Budget limit reached.');
}
}
afterToolResponse(rawOutput: string): ToolResponseEnvelope {
return this.validator.validate(rawOutput);
}
reset(): void {
this.filter.reset();
this.budget.reset();
}
}
Why this architecture works: It decouples detection from enforcement. The filter handles immediate repetition, the validator handles semantic ambiguity, and the budget handles systemic overflow. Each layer operates independently, allowing teams to tune thresholds without rewriting core logic. The guard integrates cleanly into existing hook systems by exposing beforeToolCall and afterToolResponse interfaces.
Pitfall Guide
1. Treating Parameter Variance as Unique Calls
Explanation: Agents often modify parameters slightly between retries (e.g., changing max_price from 300 to 301). Fingerprinting only exact matches allows these variants to bypass the filter.
Fix: Implement semantic fingerprinting. Hash normalized inputs, ignore non-critical fields, or use a tolerance threshold for numeric parameters.
2. Blocking Legitimate Retry Patterns
Explanation: Overly aggressive deduplication can interrupt valid workflows that require multiple calls with identical parameters (e.g., polling an async job).
Fix: Introduce a cooldown window or require a state change between allowed calls. Track PENDING responses separately from SUCCESS/FAILED.
3. Relying on LLM Self-Awareness for Termination
Explanation: Prompting the model with "stop when done" fails because LLMs lack internal state tracking. They optimize for continuation, not completion.
Fix: Externalize termination logic. Never trust the model to self-regulate. Use system-level guards and explicit response schemas.
Explanation: Modern agents spawn parallel tool calls. Non-thread-safe counters cause race conditions, allowing budget overruns.
Fix: Use atomic operations or mutex locks for counter increments. Validate budgets synchronously before dispatching concurrent calls.
5. Vague Cancellation Messages
Explanation: Returning generic errors like "Tool blocked" confuses the agent, causing it to retry with different tools or rephrase the request.
Fix: Provide structured cancellation payloads that include the reason, remaining budget, and suggested next action. Format: {"status": "blocked", "reason": "duplicate", "action": "proceed_to_next_step"}.
6. Missing Timeout Boundaries
Explanation: Call limits prevent infinite loops but do not address latency spikes. An agent can make 2 calls that each take 5 minutes.
Fix: Pair invocation budgets with wall-clock timeouts. Implement a global task timer that triggers graceful degradation if execution exceeds SLA thresholds.
7. Over-Optimizing for Cost at the Expense of Completion
Explanation: Aggressive limits may terminate complex tasks prematurely, forcing users to restart or manually intervene.
Fix: Implement tiered budgets. Start with conservative limits, but allow dynamic escalation based on task complexity signals or user confirmation.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple lookup tasks (weather, status) | Hard Invocation Budget (limit: 1-2) | Tasks are deterministic; retries indicate failure | Low baseline, predictable cap |
| Search & discovery workflows | Duplicate Call Filter + Ambiguous Response Rewriting | Allows exploration but blocks identical retries | Moderate reduction, preserves discovery quality |
| Multi-step booking/reservation | Explicit Terminal States + Tiered Budgets | Requires clear success/failure signals; complex state | High efficiency, prevents runaway costs |
| Async/long-running operations | PENDING State + Polling Cooldown + Timeout | Prevents tight loops while waiting for external systems | Controlled latency, avoids token waste |
Configuration Template
const terminationGuardConfig = {
duplicateFilter: {
windowSize: 3,
ignoreFields: ['timestamp', 'requestId'],
numericTolerance: 0.05,
},
responseSchema: {
successPrefix: 'SUCCESS:',
failurePrefix: 'FAILED:',
pendingPrefix: 'PENDING:',
fallbackState: 'PENDING',
},
invocationBudgets: {
search_inventory: 4,
check_availability: 3,
confirm_reservation: 1,
send_notification: 2,
},
timeouts: {
globalTaskLimitMs: 30000,
perToolLimitMs: 5000,
},
observability: {
emitBlockedCalls: true,
emitLoopDetected: true,
logLevel: 'warn',
},
};
Quick Start Guide
- Wrap your tool execution layer with a
beforeToolCall interceptor that instantiates ReasoningLoopGuard.
- Normalize all tool return values to match the
SUCCESS/FAILED/PENDING schema. Update legacy tools to prepend explicit state markers.
- Define initial budgets based on your workflow graph. Start conservative (e.g., 2-3 calls per tool) and adjust using telemetry.
- Deploy with observability hooks that log blocked calls, state transitions, and timeout triggers. Review logs after 24 hours to tune thresholds.
- Validate with edge cases: Run agents against tools that return partial data, simulate network delays, and verify that guards terminate loops without breaking valid multi-step flows.
Termination-first design transforms AI agents from speculative explorers into deterministic executors. By externalizing stop conditions, enforcing response contracts, and applying bounded execution limits, you eliminate reasoning loops before they consume resources. The architecture is lightweight, framework-agnostic, and immediately deployable. Treat termination as a first-class system concern, and your agents will operate within predictable cost and latency boundaries.