col negotiation, shell spawning, and sandbox management.
2. Token Budget Awareness: The router evaluates task complexity and available context window before committing to a strategy. Simple, deterministic operations bypass protocol overhead entirely.
3. Security Boundaries: CLI execution runs in restricted environments with explicit capability allowlists. MCP connections maintain persistent sessions to avoid repeated handshakes. Code execution sandboxes enforce network isolation and resource limits.
Implementation
import { spawn } from 'child_process';
import { Client as MCPClient } from '@modelcontextprotocol/sdk';
interface ToolRequest {
taskId: string;
operation: 'local_command' | 'external_api' | 'multi_step_pipeline';
payload: Record<string, unknown>;
contextBudget: number;
}
interface ToolResponse {
taskId: string;
output: string;
tokensConsumed: number;
strategy: 'cli' | 'mcp' | 'code_execution';
}
abstract class ExecutionStrategy {
abstract execute(req: ToolRequest): Promise<ToolResponse>;
abstract isEligible(req: ToolRequest): boolean;
}
class ShellStrategy extends ExecutionStrategy {
isEligible(req: ToolRequest): boolean {
return req.operation === 'local_command' && req.contextBudget > 5000;
}
async execute(req: ToolRequest): Promise<ToolResponse> {
const command = String(req.payload.command);
const args = Array.isArray(req.payload.args) ? req.payload.args : [];
return new Promise((resolve, reject) => {
const proc = spawn(command, args, { shell: true, maxBuffer: 1024 * 512 });
let stdout = '';
let stderr = '';
proc.stdout.on('data', (d) => stdout += d.toString());
proc.stderr.on('data', (d) => stderr += d.toString());
proc.on('close', (code) => {
if (code !== 0) return reject(new Error(stderr.trim()));
resolve({
taskId: req.taskId,
output: stdout.trim(),
tokensConsumed: 0,
strategy: 'cli'
});
});
});
}
}
class ProtocolBridgeStrategy extends ExecutionStrategy {
private client: MCPClient;
constructor(serverUrl: string) {
super();
this.client = new MCPClient({ transport: { url: serverUrl } });
}
isEligible(req: ToolRequest): boolean {
return req.operation === 'external_api' && req.contextBudget > 20000;
}
async execute(req: ToolRequest): Promise<ToolResponse> {
const toolName = String(req.payload.toolName);
const params = req.payload.params as Record<string, unknown>;
const result = await this.client.callTool(toolName, params);
return {
taskId: req.taskId,
output: JSON.stringify(result),
tokensConsumed: 45000,
strategy: 'mcp'
};
}
}
class SandboxCoderStrategy extends ExecutionStrategy {
isEligible(req: ToolRequest): boolean {
return req.operation === 'multi_step_pipeline' && req.contextBudget > 10000;
}
async execute(req: ToolRequest): Promise<ToolResponse> {
const script = String(req.payload.script);
// In production, this routes to a containerized runtime (e.g., gVisor, Firecracker)
// with explicit network and filesystem restrictions.
const executionResult = await this.runInSandbox(script);
return {
taskId: req.taskId,
output: executionResult.stdout,
tokensConsumed: 1800,
strategy: 'code_execution'
};
}
private async runInSandbox(script: string): Promise<{ stdout: string }> {
// Placeholder for sandbox orchestration
return { stdout: `Executed: ${script.slice(0, 40)}...` };
}
}
class TaskRouter {
private strategies: ExecutionStrategy[];
constructor() {
this.strategies = [
new ShellStrategy(),
new ProtocolBridgeStrategy('https://mcp.internal.example.com'),
new SandboxCoderStrategy()
];
}
async route(req: ToolRequest): Promise<ToolResponse> {
const eligible = this.strategies.find(s => s.isEligible(req));
if (!eligible) {
throw new Error('No eligible execution strategy for current context budget');
}
return eligible.execute(req);
}
}
Why These Choices Matter
- Explicit Eligibility Checks: Each strategy validates context budget and operation type before execution. This prevents accidental schema injection when token reserves are low.
- Persistent MCP Client: The
ProtocolBridgeStrategy maintains a single client instance. Repeated connection handshakes multiply latency and token overhead.
- Sandbox Isolation: The
SandboxCoderStrategy abstracts execution environment management. Production deployments must enforce capability restrictions (e.g., seccomp profiles, read-only filesystems) to prevent privilege escalation.
- Deterministic Routing: The router does not guess. It matches task metadata to pre-validated execution paths, ensuring predictable cost and latency profiles.
Pitfall Guide
1. Blind Schema Injection
Explanation: Connecting to MCP servers without filtering tool definitions forces the model to parse hundreds of unused schemas. This wastes context window space and increases inference cost.
Fix: Implement selective tool registration. Query server capabilities, filter by relevance to the current task, and inject only required schemas. Cache manifests to avoid repeated discovery calls.
2. Unbounded CLI Output
Explanation: Shell commands like find / or docker logs can return megabytes of text. Feeding raw output directly into the context window triggers token overflow and degrades reasoning quality.
Fix: Enforce output truncation at the execution layer. Use stream processors to cap output at 4KB. Require agents to specify pagination flags (--limit, --page) or pipe through head/tail/jq before returning results.
3. Ignoring Protocol Handshake Latency
Explanation: Establishing MCP connections involves capability negotiation, schema transfer, and authentication. Doing this per-request adds 200-800ms of latency and consumes additional tokens.
Fix: Maintain persistent connections. Use connection pooling or long-lived WebSocket sessions. Pre-warm MCP clients during agent initialization and reuse them across task batches.
4. Security Through Obscurity in Sandboxes
Explanation: Running code execution scripts without explicit capability boundaries exposes the host to privilege escalation, network exfiltration, or filesystem corruption. Historical incidents (e.g., CVE-2026-25253) demonstrate that open sandbox models are frequently exploited in production.
Fix: Apply defense-in-depth. Run sandboxes as non-root users, enforce seccomp/AppArmor profiles, restrict network access to whitelisted endpoints, and mount filesystems as read-only unless explicitly required.
Explanation: Building real-time tool discovery loops that query external registries on every request introduces unnecessary latency and token overhead. Most agent workflows operate against a stable set of integrations.
Fix: Maintain a static routing table with versioned tool manifests. Update registries asynchronously via background sync jobs. Fall back to dynamic discovery only when static routes fail.
Explanation: Multi-step workflows that return full JSON payloads after each tool call accumulate token debt rapidly. The model must re-parse previous results in subsequent turns.
Fix: Adopt the Code Execution pattern for pipelines. Generate a single script that chains operations, executes in a sandbox, and returns only the final aggregated result. This reduces intermediate context bloat by up to 98%.
7. Assuming LLMs Understand All CLI Flags
Explanation: While models are trained on extensive shell documentation, they frequently hallucinate obscure flags or misuse version-specific syntax. Raw shell access without constraints leads to failed executions and retry loops.
Fix: Provide constrained command templates or wrapper scripts. Validate arguments against a schema before execution. Log failed commands to refine prompt templates and reduce retry overhead.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Local system administration (git, docker, kubectl) | CLI / Shell Execution | Models possess pre-trained knowledge; zero schema overhead; direct process spawning | Minimal token cost; low latency |
| Real-time external data retrieval (Salesforce, Notion, Stripe) | Standard MCP | Native OAuth support; structured API contracts; multi-tenant governance | High upfront token cost; moderate latency |
| Multi-step data transformation pipelines | MCP + Code Execution | Batches operations in sandbox; reduces intermediate result bloat; preserves auth | ~98% token reduction vs sequential calls |
| Prototyping or rapid iteration | CLI / Shell Execution | Immediate feedback loop; no protocol setup; leverages existing man pages | Low infrastructure cost; high developer velocity |
| Enterprise multi-agent orchestration | Standard MCP | Centralized capability negotiation; granular permission boundaries; audit trails | Higher operational overhead; predictable scaling |
Configuration Template
// agent-tooling.config.ts
export const toolingConfig = {
routing: {
contextThresholds: {
cli: 5000,
mcp: 20000,
codeExecution: 10000
},
fallback: 'cli',
maxRetries: 2
},
execution: {
shell: {
allowedCommands: ['git', 'docker', 'kubectl', 'jq', 'head', 'tail'],
outputLimit: 4096,
timeoutMs: 5000
},
mcp: {
persistentSessions: true,
sessionTtlMinutes: 30,
schemaFilter: 'task_relevant'
},
sandbox: {
runtime: 'gvisor',
networkPolicy: 'whitelist',
filesystem: 'read-only',
maxExecutionTimeMs: 15000
}
},
telemetry: {
trackTokenConsumption: true,
logStrategyDecisions: true,
alertOnContextOverflow: true
}
};
Quick Start Guide
- Initialize the router: Import the
TaskRouter class and instantiate it with your preferred execution strategies. Configure context thresholds based on your target model's window size.
- Define task payloads: Structure agent requests with explicit
operation types (local_command, external_api, multi_step_pipeline) and attach current context budget metrics.
- Execute and monitor: Route tasks through the router. Capture
tokensConsumed and strategy fields in your telemetry pipeline. Set alerts for context overflow or repeated fallback events.
- Iterate on constraints: Review execution logs weekly. Tighten shell allowlists, adjust output limits, and refine sandbox policies based on actual usage patterns. Update static routing tables as integrations evolve.