CLI vs MCP: guía para agentes en producción

By Codcompass Team·2026-05-28·8 min read

Beyond the Protocol War: Architecting Efficient Agent Tooling in Production

Current Situation Analysis

The rapid adoption of AI agents in production environments has exposed a critical architectural blind spot: context window economics. Early agent frameworks treated tool integration as a simple connectivity problem, leading teams to adopt standardized protocols like the Model Context Protocol (MCP) without accounting for token budgeting. The industry narrative positioned MCP as a universal bridge, but production telemetry revealed a different reality. Tool schema injection, capability negotiation, and result wrapping consume disproportionate context resources before an agent executes a single meaningful operation.

This problem is frequently misunderstood because protocol standardization is conflated with operational efficiency. Engineering teams assume that a unified interface automatically reduces complexity. In practice, the upfront metadata payload of MCP servers directly competes with the model's reasoning capacity. When an agent connects to a single GitHub MCP server, it ingests approximately 55,000 tokens representing 93 distinct tool definitions. Adding two more servers (e.g., Slack and a database connector) can consume over 70% of a 200K token context window purely in protocol metadata. This leaves insufficient space for user prompts, conversation history, and intermediate reasoning steps.

The severity of this issue was formally acknowledged by Anthropic's engineering team in November 2025. Their internal benchmarks confirmed that schema overload and intermediate result serialization directly inflate API costs and increase latency. The acknowledgment shifted the conversation from protocol preference to resource allocation. Teams now recognize that tooling architecture must be treated as a constrained optimization problem, balancing token efficiency, execution speed, authentication requirements, and composability.

WOW Moment: Key Findings

The following comparison isolates the operational trade-offs between three dominant tooling strategies. The data reflects production telemetry and vendor-published benchmarks.

Approach	Context Overhead	Execution Latency	Enterprise Auth Readiness
CLI / Shell Execution	~0 tokens (pre-trained knowledge)	Low (direct process spawn)	Limited (relies on host credentials)
Standard MCP	High (~55k+ tokens per server)	Medium-High (handshake + schema parse)	Native (OAuth, dynamic client registration)
MCP + Code Execution	Low (~98% reduction vs standard)	Medium (sandbox compilation + batch calls)	Native (inherits MCP auth layer)

Why this matters: The table demonstrates that there is no universally optimal interface. CLI execution leverages the model's pre-existing training data, eliminating schema injection entirely. Standard MCP provides robust authentication and multi-tenant governance but imposes a heavy context tax. The Code Execution pattern, validated by Anthropic's engineering team, decouples tool discovery from execution, allowing agents to write short scripts that invoke MCP tools within a sandbox. This hybrid approach preserves enterprise security while reducing token consumption from ~150,000 to ~2,000 in benchmark scenarios. Architecting for production requires routing tasks to the interface that aligns with the current token budget and security boundary.

Core Solution

Building a production-ready agent tooling layer requires a routing architecture that evaluates task characteristics before selecting an execution strategy. The following implementation demonstrates a TypeScript-based router that dynamically delegates to CLI, MCP, or Code Execution sandboxes.

Architecture Rationale

Separation of Concerns: Tool execution should not be tightly coupled to the agent's reasoning loop. A dedicated routing layer isolates proto

col negotiation, shell spawning, and sandbox management. 2. Token Budget Awareness: The router evaluates task complexity and available context window before committing to a strategy. Simple, deterministic operations bypass protocol overhead entirely. 3. Security Boundaries: CLI execution runs in restricted environments with explicit capability allowlists. MCP connections maintain persistent sessions to avoid repeated handshakes. Code execution sandboxes enforce network isolation and resource limits.

Implementation

import { spawn } from 'child_process';
import { Client as MCPClient } from '@modelcontextprotocol/sdk';

interface ToolRequest {
  taskId: string;
  operation: 'local_command' | 'external_api' | 'multi_step_pipeline';
  payload: Record<string, unknown>;
  contextBudget: number;
}

interface ToolResponse {
  taskId: string;
  output: string;
  tokensConsumed: number;
  strategy: 'cli' | 'mcp' | 'code_execution';
}

abstract class ExecutionStrategy {
  abstract execute(req: ToolRequest): Promise<ToolResponse>;
  abstract isEligible(req: ToolRequest): boolean;
}

class ShellStrategy extends ExecutionStrategy {
  isEligible(req: ToolRequest): boolean {
    return req.operation === 'local_command' && req.contextBudget > 5000;
  }

  async execute(req: ToolRequest): Promise<ToolResponse> {
    const command = String(req.payload.command);
    const args = Array.isArray(req.payload.args) ? req.payload.args : [];
    
    return new Promise((resolve, reject) => {
      const proc = spawn(command, args, { shell: true, maxBuffer: 1024 * 512 });
      let stdout = '';
      let stderr = '';

      proc.stdout.on('data', (d) => stdout += d.toString());
      proc.stderr.on('data', (d) => stderr += d.toString());
      
      proc.on('close', (code) => {
        if (code !== 0) return reject(new Error(stderr.trim()));
        resolve({
          taskId: req.taskId,
          output: stdout.trim(),
          tokensConsumed: 0,
          strategy: 'cli'
        });
      });
    });
  }
}

class ProtocolBridgeStrategy extends ExecutionStrategy {
  private client: MCPClient;
  
  constructor(serverUrl: string) {
    super();
    this.client = new MCPClient({ transport: { url: serverUrl } });
  }

  isEligible(req: ToolRequest): boolean {
    return req.operation === 'external_api' && req.contextBudget > 20000;
  }

  async execute(req: ToolRequest): Promise<ToolResponse> {
    const toolName = String(req.payload.toolName);
    const params = req.payload.params as Record<string, unknown>;
    
    const result = await this.client.callTool(toolName, params);
    return {
      taskId: req.taskId,
      output: JSON.stringify(result),
      tokensConsumed: 45000,
      strategy: 'mcp'
    };
  }
}

class SandboxCoderStrategy extends ExecutionStrategy {
  isEligible(req: ToolRequest): boolean {
    return req.operation === 'multi_step_pipeline' && req.contextBudget > 10000;
  }

  async execute(req: ToolRequest): Promise<ToolResponse> {
    const script = String(req.payload.script);
    // In production, this routes to a containerized runtime (e.g., gVisor, Firecracker)
    // with explicit network and filesystem restrictions.
    const executionResult = await this.runInSandbox(script);
    
    return {
      taskId: req.taskId,
      output: executionResult.stdout,
      tokensConsumed: 1800,
      strategy: 'code_execution'
    };
  }

  private async runInSandbox(script: string): Promise<{ stdout: string }> {
    // Placeholder for sandbox orchestration
    return { stdout: `Executed: ${script.slice(0, 40)}...` };
  }
}

class TaskRouter {
  private strategies: ExecutionStrategy[];

  constructor() {
    this.strategies = [
      new ShellStrategy(),
      new ProtocolBridgeStrategy('https://mcp.internal.example.com'),
      new SandboxCoderStrategy()
    ];
  }

  async route(req: ToolRequest): Promise<ToolResponse> {
    const eligible = this.strategies.find(s => s.isEligible(req));
    if (!eligible) {
      throw new Error('No eligible execution strategy for current context budget');
    }
    return eligible.execute(req);
  }
}

Why These Choices Matter

Explicit Eligibility Checks: Each strategy validates context budget and operation type before execution. This prevents accidental schema injection when token reserves are low.
Persistent MCP Client: The ProtocolBridgeStrategy maintains a single client instance. Repeated connection handshakes multiply latency and token overhead.
Sandbox Isolation: The SandboxCoderStrategy abstracts execution environment management. Production deployments must enforce capability restrictions (e.g., seccomp profiles, read-only filesystems) to prevent privilege escalation.
Deterministic Routing: The router does not guess. It matches task metadata to pre-validated execution paths, ensuring predictable cost and latency profiles.

Pitfall Guide

Explanation: Connecting to MCP servers without filtering tool definitions forces the model to parse hundreds of unused schemas. This wastes context window space and increases inference cost. Fix: Implement selective tool registration. Query server capabilities, filter by relevance to the current task, and inject only required schemas. Cache manifests to avoid repeated discovery calls.

2. Unbounded CLI Output

Explanation: Shell commands like find / or docker logs can return megabytes of text. Feeding raw output directly into the context window triggers token overflow and degrades reasoning quality. Fix: Enforce output truncation at the execution layer. Use stream processors to cap output at 4KB. Require agents to specify pagination flags (--limit, --page) or pipe through head/tail/jq before returning results.

3. Ignoring Protocol Handshake Latency

Explanation: Establishing MCP connections involves capability negotiation, schema transfer, and authentication. Doing this per-request adds 200-800ms of latency and consumes additional tokens. Fix: Maintain persistent connections. Use connection pooling or long-lived WebSocket sessions. Pre-warm MCP clients during agent initialization and reuse them across task batches.

4. Security Through Obscurity in Sandboxes

Explanation: Running code execution scripts without explicit capability boundaries exposes the host to privilege escalation, network exfiltration, or filesystem corruption. Historical incidents (e.g., CVE-2026-25253) demonstrate that open sandbox models are frequently exploited in production. Fix: Apply defense-in-depth. Run sandboxes as non-root users, enforce seccomp/AppArmor profiles, restrict network access to whitelisted endpoints, and mount filesystems as read-only unless explicitly required.

5. Over-Engineering Dynamic Tool Discovery

Explanation: Building real-time tool discovery loops that query external registries on every request introduces unnecessary latency and token overhead. Most agent workflows operate against a stable set of integrations. Fix: Maintain a static routing table with versioned tool manifests. Update registries asynchronously via background sync jobs. Fall back to dynamic discovery only when static routes fail.

6. Neglecting Intermediate Result Costs

Explanation: Multi-step workflows that return full JSON payloads after each tool call accumulate token debt rapidly. The model must re-parse previous results in subsequent turns. Fix: Adopt the Code Execution pattern for pipelines. Generate a single script that chains operations, executes in a sandbox, and returns only the final aggregated result. This reduces intermediate context bloat by up to 98%.

7. Assuming LLMs Understand All CLI Flags

Explanation: While models are trained on extensive shell documentation, they frequently hallucinate obscure flags or misuse version-specific syntax. Raw shell access without constraints leads to failed executions and retry loops. Fix: Provide constrained command templates or wrapper scripts. Validate arguments against a schema before execution. Log failed commands to refine prompt templates and reduce retry overhead.

Production Bundle

Action Checklist

Audit current tooling architecture: Map all active MCP servers and CLI dependencies to their token overhead and latency profiles.
Implement context budget guards: Add pre-execution checks that reject high-overhead strategies when available tokens fall below threshold.
Enforce output sanitization: Configure execution layers to truncate, paginate, or structure raw command output before returning to the agent.
Establish persistent connections: Replace per-request MCP handshakes with long-lived client sessions or connection pools.
Harden sandbox environments: Apply capability restrictions, network whitelists, and read-only filesystem mounts to all code execution runtimes.
Adopt batch execution for pipelines: Replace sequential tool calls with Code Execution scripts that chain operations and return aggregated results.
Maintain static routing tables: Cache tool manifests and update asynchronously. Avoid real-time registry queries during inference.

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local system administration (git, docker, kubectl)	CLI / Shell Execution	Models possess pre-trained knowledge; zero schema overhead; direct process spawning	Minimal token cost; low latency
Real-time external data retrieval (Salesforce, Notion, Stripe)	Standard MCP	Native OAuth support; structured API contracts; multi-tenant governance	High upfront token cost; moderate latency
Multi-step data transformation pipelines	MCP + Code Execution	Batches operations in sandbox; reduces intermediate result bloat; preserves auth	~98% token reduction vs sequential calls
Prototyping or rapid iteration	CLI / Shell Execution	Immediate feedback loop; no protocol setup; leverages existing man pages	Low infrastructure cost; high developer velocity
Enterprise multi-agent orchestration	Standard MCP	Centralized capability negotiation; granular permission boundaries; audit trails	Higher operational overhead; predictable scaling

Configuration Template

// agent-tooling.config.ts
export const toolingConfig = {
  routing: {
    contextThresholds: {
      cli: 5000,
      mcp: 20000,
      codeExecution: 10000
    },
    fallback: 'cli',
    maxRetries: 2
  },
  execution: {
    shell: {
      allowedCommands: ['git', 'docker', 'kubectl', 'jq', 'head', 'tail'],
      outputLimit: 4096,
      timeoutMs: 5000
    },
    mcp: {
      persistentSessions: true,
      sessionTtlMinutes: 30,
      schemaFilter: 'task_relevant'
    },
    sandbox: {
      runtime: 'gvisor',
      networkPolicy: 'whitelist',
      filesystem: 'read-only',
      maxExecutionTimeMs: 15000
    }
  },
  telemetry: {
    trackTokenConsumption: true,
    logStrategyDecisions: true,
    alertOnContextOverflow: true
  }
};

Quick Start Guide

Initialize the router: Import the TaskRouter class and instantiate it with your preferred execution strategies. Configure context thresholds based on your target model's window size.
Define task payloads: Structure agent requests with explicit operation types (local_command, external_api, multi_step_pipeline) and attach current context budget metrics.
Execute and monitor: Route tasks through the router. Capture tokensConsumed and strategy fields in your telemetry pipeline. Set alerts for context overflow or repeated fallback events.
Iterate on constraints: Review execution logs weekly. Tighten shell allowlists, adjust output limits, and refine sandbox policies based on actual usage patterns. Update static routing tables as integrations evolve.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back