ts. Identify whether the task requires:
- Multiple distinct knowledge domains
- Parallel execution paths
- Cross-validation from independent reasoning paths
- Different model capabilities (vision, code, text)
If the answer is no, a single-agent architecture with explicit tool routing is sufficient.
Step 2: Single-Agent Baseline Implementation
The following TypeScript implementation demonstrates a unified agent controller with tool routing, context management, output validation, and escalation fallback. The architecture prioritizes deterministic routing, bounded context windows, and explicit validation before response delivery.
import { z } from 'zod';
// Tool definitions with explicit schemas
interface ToolDefinition {
name: string;
description: string;
parameters: z.ZodTypeAny;
execute: (params: z.infer<z.ZodTypeAny>) => Promise<string>;
}
// Context window manager with sliding retention
class ContextWindow {
private history: Array<{ role: 'user' | 'assistant'; content: string }> = [];
private readonly maxTurns: number;
constructor(maxTurns: number = 5) {
this.maxTurns = maxTurns;
}
add(role: 'user' | 'assistant', content: string): void {
this.history.push({ role, content });
if (this.history.length > this.maxTurns * 2) {
this.history = this.history.slice(-this.maxTurns * 2);
}
}
getRecent(): Array<{ role: 'user' | 'assistant'; content: string }> {
return this.history;
}
}
// Output validator middleware
class ResponseValidator {
async verify(originalQuery: string, generatedResponse: string): Promise<boolean> {
// In production, this calls a lightweight verification model
// or runs rule-based hallucination checks against retrieved context
const hasContextReference = generatedResponse.includes('[source:') || generatedResponse.includes('According to');
const isDirectlyAnswering = generatedResponse.length > 20 && !generatedResponse.includes('I cannot');
return hasContextReference && isDirectlyAnswering;
}
}
// Unified agent controller
export class UnifiedAgentController {
private tools: Map<string, ToolDefinition>;
private context: ContextWindow;
private validator: ResponseValidator;
private readonly maxToolCalls: number;
constructor(tools: ToolDefinition[], maxTurns: number = 5, maxToolCalls: number = 3) {
this.tools = new Map(tools.map(t => [t.name, t]));
this.context = new ContextWindow(maxTurns);
this.validator = new ResponseValidator();
this.maxToolCalls = maxToolCalls;
}
async processQuery(userQuery: string): Promise<{ response: string; fallback: boolean; costEstimate: number }> {
this.context.add('user', userQuery);
// Step 1: Tool routing decision
const selectedTools = this.routeTools(userQuery);
let toolResults: string[] = [];
// Step 2: Execute tools with loop protection
for (const tool of selectedTools.slice(0, this.maxToolCalls)) {
const result = await tool.execute({ query: userQuery });
toolResults.push(`[${tool.name}]: ${result}`);
}
// Step 3: Generate response with tool context
const systemPrompt = this.buildSystemPrompt(toolResults);
const rawResponse = await this.callLLM(systemPrompt, this.context.getRecent());
// Step 4: Validation gate
const isValid = await this.validator.verify(userQuery, rawResponse);
if (!isValid) {
return {
response: 'I need to connect you with a specialist to ensure accuracy.',
fallback: true,
costEstimate: this.calculateCost(selectedTools.length, 2)
};
}
this.context.add('assistant', rawResponse);
return {
response: rawResponse,
fallback: false,
costEstimate: this.calculateCost(selectedTools.length, 2)
};
}
private routeTools(query: string): ToolDefinition[] {
// Production: Use lightweight classifier or keyword routing
// Here we simulate deterministic routing based on query patterns
const allTools = Array.from(this.tools.values());
if (query.toLowerCase().includes('billing') || query.toLowerCase().includes('invoice')) {
return allTools.filter(t => t.name === 'TransactionLedgerClient');
}
if (query.toLowerCase().includes('account') || query.toLowerCase().includes('profile')) {
return allTools.filter(t => t.name === 'AccountProfileFetcher');
}
return allTools.filter(t => t.name === 'KnowledgeBaseRetriever');
}
private buildSystemPrompt(toolResults: string[]): string {
return `You are a Technical Resolution Orchestrator. Use the following retrieved data to answer accurately.
If data is insufficient, acknowledge the gap and trigger escalation. Do not hallucinate.
Retrieved Context:
${toolResults.join('\n')}`;
}
private async callLLM(systemPrompt: string, history: Array<{ role: string; content: string }>): Promise<string> {
// Placeholder for LLM API call
// In production: stream response, track tokens, apply temperature=0.2 for consistency
return `Based on the retrieved context, here is the resolution: [Simulated Response]`;
}
private calculateCost(toolCalls: number, llmPasses: number): number {
// Base: $0.002 (routing) + $0.001 per tool + $0.003 (generation)
return 0.002 + (toolCalls * 0.001) + (llmPasses * 0.003);
}
}
Step 3: Architecture Decisions and Rationale
- Single Controller vs. Distributed Agents: A unified controller eliminates routing overhead, reduces inter-agent serialization latency, and centralizes error handling. Distribution is only justified when sub-tasks require independent model capabilities or parallel execution.
- Explicit Tool Routing: Instead of letting agents discover tools through trial-and-error ReAct loops, deterministic routing based on query classification reduces token consumption and prevents cascading failures.
- Sliding Window Context: Retaining only the last 5 conversation turns prevents context window bloat, which directly impacts cost and hallucination rates. Production systems should implement semantic compression for longer histories.
- Validation Gate: Running a lightweight verification step before response delivery catches hallucinations and incomplete reasoning. This replaces the need for a dedicated critic agent in low-stakes workflows.
- Escalation Fallback: Explicit human handoff for low-confidence responses preserves trust and prevents the system from generating plausible but incorrect answers.
Step 4: Conditional Escalation to Multi-Agent
Reserve orchestration for workloads that meet at least three of the following criteria:
- Three or more distinct knowledge domains with non-overlapping toolsets
- Parallel execution paths that reduce wall-clock time
- High-stakes outputs requiring cross-validation
- Sub-tasks requiring different model families (e.g., vision + code + text)
- Team capacity to maintain 3–5x debug complexity
Pitfall Guide
1. Prompt Fragmentation
Explanation: Teams split a single coherent task into multiple agents by varying prompts rather than separating tools, knowledge, or reasoning patterns. This creates duplication, not specialization.
Fix: Consolidate into a unified system prompt with explicit tool definitions. Use routing logic, not prompt variation, to direct behavior.
2. Sequential Orchestration Fallacy
Explanation: Running agents in a strict A→B→C sequence adds routing and synthesis latency without parallelism benefits. The multi-agent pattern only reduces wall-clock time when branches execute concurrently.
Fix: Convert sequential workflows into single-agent step execution, or identify independent branches that can run in parallel and merge results deterministically.
3. Model Monolith Assumption
Explanation: Routing all sub-tasks through a single general-purpose LLM ignores capability mismatches. Vision tasks, code generation, and structured data extraction perform significantly better on specialized models.
Fix: Implement model-aware routing. Assign sub-tasks to the optimal model family and aggregate results at the synthesis layer.
4. Cascade Loop Amplification
Explanation: Agents triggering each other recursively without depth limits causes exponential token consumption and unbounded latency. Production queries with ambiguity frequently trigger this behavior.
Fix: Implement max-depth counters, circuit breakers, and loop detection heuristics. Terminate branches that exceed iteration thresholds and fallback to human escalation.
5. Debug Blindspots
Explanation: Multi-agent systems obscure failure modes. When an output is incorrect, it is unclear whether the router misclassified, a tool failed, a synthesizer merged poorly, or a critic missed an error.
Fix: Implement span-based tracing with unique execution IDs per agent. Log token counts, tool outputs, and decision points. Enable replayable state snapshots for post-mortem analysis.
6. Over-Validation Overhead
Explanation: Adding critic or verifier agents to low-stakes workflows inflates cost without meaningful accuracy improvement. Validation should be proportional to risk.
Fix: Apply rigorous cross-validation only to high-cost or high-risk decision nodes. Use lightweight rule-based checks or confidence thresholds for routine outputs.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Uniform support queries (billing, account, FAQ) | Single-Agent with Tool Routing | No genuine specialization; multi-agent adds routing/synthesis overhead | Reduces cost by 60–85% |
| Cross-domain research synthesis | Multi-Agent with Parallel Branches | Distinct knowledge domains, independent toolsets, acceptable latency | Increases cost 5–8x, justified by accuracy |
| Code review pipeline | Multi-Agent with Model-Aware Routing | Security, performance, and structure require different model capabilities | Moderate cost increase, high accuracy lift |
| High-stakes financial/legal decisions | Multi-Agent with Debate Pattern | Cross-validation and auditability outweigh latency/cost | 2–3x cost, risk mitigation justifies expense |
| Low-resource team (<3 engineers) | Single-Agent with Escalation | Multi-agent debug complexity exceeds maintenance capacity | Prevents operational debt and incident escalation |
Configuration Template
// agent-config.production.ts
export const AgentArchitectureConfig = {
mode: 'single-agent', // 'single-agent' | 'multi-agent'
routing: {
strategy: 'deterministic', // 'deterministic' | 'llm-classifier'
maxToolCalls: 3,
loopLimit: 2,
},
context: {
windowSize: 5,
compressionThreshold: 4000, // tokens
},
validation: {
enabled: true,
method: 'lightweight-check', // 'lightweight-check' | 'critic-agent'
confidenceThreshold: 0.75,
},
escalation: {
fallbackToHuman: true,
triggerConditions: ['validation-failed', 'loop-limit-reached', 'confidence-low'],
},
costGuardrails: {
maxCostPerQuery: 0.025,
circuitBreakerThreshold: 3, // consecutive high-cost queries
},
tracing: {
enabled: true,
logLevel: 'debug',
spanIdPrefix: 'agent-exec',
},
};
Quick Start Guide
- Initialize the controller: Import
UnifiedAgentController, define your tool implementations, and set context window size to 5 turns.
- Configure routing and validation: Enable deterministic tool routing, set max tool calls to 3, and activate the lightweight validation gate.
- Deploy with tracing: Enable span-based logging, assign execution IDs, and route validation failures to a human handoff queue.
- Monitor production metrics: Track cost per query, p95 latency, validation pass rate, and escalation frequency. Adjust routing thresholds based on observed query complexity.
- Evaluate escalation criteria: If cost exceeds $0.025/query or p95 latency surpasses 5 seconds, audit tool usage and context window efficiency before considering multi-agent migration.