ion gates verify schema completeness and constraint satisfaction. This prevents runaway loops and ensures data integrity across hops.
4. OpenTelemetry-First Tracing: Every decision, tool call, and validation check emits a span. Traces are structured as directed acyclic graphs, enabling precise failure localization. Metrics like token consumption, latency, and retry counts are aggregated per workflow.
Implementation Blueprint
The following TypeScript implementation demonstrates a hardened execution engine. It replaces ad-hoc function chaining with explicit guards, idempotent tool wrappers, and trace-aware orchestration.
import { trace, SpanStatusCode } from '@opentelemetry/api';
import { z } from 'zod';
// 1. Budget Guard: Enforces hard limits before execution
export class ExecutionBudget {
private maxSteps: number;
private maxTokens: number;
private maxTimeMs: number;
private startTime: number;
private stepCount: number = 0;
private tokenCount: number = 0;
constructor(config: { steps: number; tokens: number; timeMs: number }) {
this.maxSteps = config.steps;
this.maxTokens = config.tokens;
this.maxTimeMs = config.timeMs;
this.startTime = Date.now();
}
check(): { allowed: boolean; reason?: string } {
if (this.stepCount >= this.maxSteps) return { allowed: false, reason: 'Step limit exceeded' };
if (this.tokenCount >= this.maxTokens) return { allowed: false, reason: 'Token budget exhausted' };
if (Date.now() - this.startTime >= this.maxTimeMs) return { allowed: false, reason: 'Execution timeout' };
return { allowed: true };
}
consume(stepTokens: number): void {
this.stepCount++;
this.tokenCount += stepTokens;
}
}
// 2. Idempotent Tool Wrapper: Prevents duplicate side effects
export class IdempotentTool {
private executedKeys: Map<string, unknown> = new Map();
constructor(private toolName: string) {}
async execute(key: string, payload: unknown, fn: () => Promise<unknown>): Promise<unknown> {
if (this.executedKeys.has(key)) {
return this.executedKeys.get(key);
}
const result = await fn();
this.executedKeys.set(key, result);
return result;
}
}
// 3. Schema Validator: Enforces data contracts between agents
export class SchemaGate {
constructor(private schema: z.ZodTypeAny) {}
validate(data: unknown): { valid: boolean; errors?: string[] } {
const result = this.schema.safeParse(data);
if (!result.success) {
return { valid: false, errors: result.error.errors.map(e => `${e.path.join('.')}: ${e.message}`) };
}
return { valid: true };
}
}
// 4. Orchestrator: Wires routing, validation, budgeting, and tracing
export class AgentOrchestrator {
private budget: ExecutionBudget;
private tracer = trace.getTracer('agent-workflow');
constructor(budgetConfig: { steps: number; tokens: number; timeMs: number }) {
this.budget = new ExecutionBudget(budgetConfig);
}
async runWorkflow(requestId: string, input: Record<string, unknown>): Promise<Record<string, unknown>> {
return this.tracer.startActiveSpan(`workflow:${requestId}`, async (span) => {
try {
span.setAttribute('request.id', requestId);
// Step 1: Route to specialist
const specialist = this.routeRequest(input);
// Step 2: Execute with budget & validation
const intermediate = await this.executeWithGuards(specialist, input);
// Step 3: Supervisor validation
const validated = this.supervisorCheck(intermediate);
if (!validated) {
span.setStatus({ code: SpanStatusCode.ERROR, message: 'Supervisor validation failed' });
throw new Error('Workflow halted: constraint violation');
}
span.setStatus({ code: SpanStatusCode.OK });
return intermediate;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
});
}
private routeRequest(input: Record<string, unknown>): string {
// Simplified routing logic; production uses classifier model or rule engine
return input['priority'] === 'high' ? 'specialist-fast' : 'specialist-standard';
}
private async executeWithGuards(specialist: string, input: Record<string, unknown>): Promise<Record<string, unknown>> {
const budgetCheck = this.budget.check();
if (!budgetCheck.allowed) throw new Error(budgetCheck.reason);
// Simulate LLM call + tool invocation
const estimatedTokens = 1200;
this.budget.consume(estimatedTokens);
// Apply idempotency key based on request fingerprint
const idempotencyKey = `${specialist}:${JSON.stringify(input)}`;
const tool = new IdempotentTool(specialist);
return await tool.execute(idempotencyKey, input, async () => {
// In production, this calls MCP server via standardized protocol
return { status: 'completed', specialist, data: input };
});
}
private supervisorCheck(output: Record<string, unknown>): boolean {
// Production: validates against business rules, budget alignment, and safety constraints
return output['status'] === 'completed' && Object.keys(output).length > 0;
}
}
Why This Structure Works
- Budget enforcement happens before execution, not after. This prevents token burn and infinite loops from consuming resources.
- Idempotency is keyed deterministically, ensuring that network retries or orchestrator restarts never duplicate side effects.
- Validation gates are schema-driven, making contracts explicit and machine-verifiable. Missing fields fail fast instead of propagating corrupted state.
- Tracing wraps the entire lifecycle, capturing spans for routing, execution, validation, and supervisor checks. OpenTelemetry exporters can route these to Jaeger, Datadog, or New Relic without framework lock-in.
Pitfall Guide
1. Silent Retries Without Idempotency
Explanation: When a tool call times out, the orchestrator retries. Without a deduplication mechanism, the external service processes the request twice, causing duplicate charges, bookings, or data mutations.
Fix: Generate a deterministic idempotency key from request parameters. Cache results in-memory or in a distributed store. Return the cached result on duplicate keys instead of re-invoking the tool.
2. Unbounded Execution Loops
Explanation: Agents can enter recursive planning cycles when intermediate outputs fail to converge. Without hard limits, token consumption and latency grow exponentially.
Fix: Implement a multi-dimensional budget: maximum steps, maximum tokens, maximum wall-clock time, and maximum tool calls. Reject execution immediately when any threshold is breached.
3. Schema Drift Between Agents
Explanation: Specialist agents output loosely structured JSON. Downstream agents or tools assume fields exist, causing runtime crashes or silent data corruption.
Fix: Define strict Zod/JSON Schema contracts for every agent output. Validate before passing data to the next hop. Reject and request regeneration if validation fails.
4. Shallow Observability
Explanation: Logging only final outputs or error messages makes debugging multi-step workflows impossible. You cannot determine which agent, tool, or decision caused a failure.
Fix: Emit OpenTelemetry spans for every LLM invocation, tool call, validation gate, and routing decision. Attach metadata like token counts, latency, and retry attempts. Export traces to a centralized backend.
5. Framework Lock-In
Explanation: Tying agent logic directly to LangChain, LlamaIndex, or Semantic Kernel internals makes migration painful and obscures failure modes when the framework updates.
Fix: Abstract tool interfaces behind protocol contracts (e.g., MCP). Keep orchestrator logic framework-agnostic. Use dependency injection for model clients and routing engines.
6. Ignoring Rate Limits & Backpressure
Explanation: External APIs enforce rate limits. Bursting requests without backoff triggers 429 errors, cascading failures, and degraded user experience.
Fix: Implement token bucket rate limiting per tool. Use exponential backoff with jitter on 429/5xx responses. Queue requests when upstream capacity is saturated.
7. Missing Human-in-the-Loop Escalation
Explanation: Agents operate in binary mode: succeed or fail. Complex, ambiguous, or high-stakes requests require human judgment, but no escalation path exists.
Fix: Define confidence thresholds and risk categories. When outputs fall below thresholds or exceed risk limits, pause execution and route to a human review queue. Log the pause reason and context for auditability.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput, low-complexity routing | Rule-based router + lightweight specialist | Reduces LLM calls, minimizes latency | Low (mostly compute) |
| Complex reasoning, multi-step planning | Supervisor loop + strict budget gates | Prevents runaway token consumption | Medium-High (controlled) |
| Financial/booking transactions | Idempotent tools + synchronous validation | Eliminates duplicate charges/mutations | Low (infrastructure) |
| Unpredictable external APIs | Circuit breaker + queue + backoff | Prevents cascading failures | Medium (queue overhead) |
| Compliance/audit requirements | Full trace export + human escalation | Ensures reproducibility and oversight | Medium (storage + review) |
Configuration Template
agent:
id: travel-planner-v2
version: 1.4.0
execution:
budget:
max_steps: 12
max_tokens: 8000
max_time_ms: 30000
max_tool_calls: 8
validation:
strict_mode: true
schemas:
- path: ./schemas/itinerary.json
- path: ./schemas/budget.json
observability:
tracing:
enabled: true
exporter: otlp
endpoint: https://otel-collector.internal:4318
sample_rate: 1.0
metrics:
- token_consumption
- tool_latency
- retry_count
- validation_failures
tools:
idempotency:
strategy: deterministic_key
ttl_seconds: 3600
retry:
max_attempts: 3
backoff: exponential
jitter: true
base_delay_ms: 500
Quick Start Guide
- Initialize the orchestrator skeleton: Create a new TypeScript project, install
@opentelemetry/api, zod, and your preferred HTTP client. Copy the AgentOrchestrator class and adapt the routing logic to your domain.
- Define execution budgets: Instantiate
ExecutionBudget with conservative limits (e.g., 10 steps, 5000 tokens, 20s timeout). Adjust based on load testing results.
- Wire validation gates: Create Zod schemas for every agent output. Attach
SchemaGate instances to each handoff point. Fail fast on invalid payloads.
- Instrument tracing: Configure OpenTelemetry SDK with OTLP exporter. Wrap LLM calls and tool invocations in spans. Attach metadata like
request_id, specialist_name, and token_count.
- Deploy with idempotency: Implement the
IdempotentTool wrapper for all state-mutating endpoints. Generate keys from request fingerprints. Test retry scenarios to verify duplicate suppression.
Shipping agents at scale is not about bigger models or cleverer prompts. It's about treating autonomous workflows as distributed systems that require contracts, boundaries, and visibility. Apply these patterns consistently, and your agents will survive production traffic instead of collapsing under it.