Solucionar Timeouts de MCP: Patrón HandleId Asíncrono

By Codcompass Team·2026-05-21·8 min read

Breaking the MCP Timeout Chain: Asynchronous Job Handles for Resilient AI Agents

Current Situation Analysis

The integration of external data sources and third-party APIs into AI agent workflows has exposed a critical architectural flaw in synchronous tool execution. When agents operate through the Model Context Protocol (MCP), they expect deterministic, low-latency responses from registered tools. However, real-world external dependencies rarely behave predictably. Network latency, rate limiting, heavy computation pipelines, and downstream service degradation routinely push response times beyond acceptable thresholds.

This problem is frequently overlooked because developers treat MCP tools like standard REST endpoints, assuming the protocol will gracefully handle delays. In reality, MCP enforces implicit synchronous expectations. If a tool does not return within approximately 7 to 10 seconds, the underlying transport layer often terminates the connection, propagating an HTTP 424 (Failed Dependency) error back to the orchestrating agent. The agent receives no partial state, no progress indicator, and no recovery path. It simply halts.

Community telemetry and production logs consistently validate this failure mode. Reports from OpenAI's developer forums and independent resilience studies highlight a recurring pattern: agents freeze indefinitely when awaiting slow external calls, or crash into 424 states without fallback logic. Benchmarking synchronous versus asynchronous tool designs reveals a stark contrast. A blocking call to a 15-second external API inflates total agent response time to nearly 18 seconds, consuming excessive context window tokens and degrading user experience. Meanwhile, an asynchronous handoff pattern reduces perceived latency to under 4 seconds by decoupling execution from response.

The core misunderstanding lies in conflating tool registration with execution semantics. MCP defines how tools are discovered and invoked, but it does not mandate how long-running operations should be managed. Without an explicit async contract, agents become tightly coupled to external service health, turning transient latency into systemic workflow failure.

WOW Moment: Key Findings

The architectural shift from synchronous blocking to asynchronous job handles fundamentally changes how AI agents interact with external systems. By returning a lightweight reference token immediately and deferring heavy computation, the agent's execution loop remains unblocked, context windows stay lean, and error surfaces become explicit rather than silent.

Approach	End-to-End Latency	Agent Thread State	Error Propagation	Token Efficiency
Synchronous Blocking	17.8s	Frozen until timeout	Silent 424 or hang	High (context bloat)
Async HandleId	3.7s	Unblocked, polling-ready	Explicit status codes	Low (minimal context)
Optimistic Retry	9.2s	Intermittently blocked	Transient 5xx/429	Medium (retry overhead)

This finding matters because it transforms external dependencies from single points of failure into manageable state machines. The async handle pattern enables agents to:

Maintain responsive UI/UX loops without freezing
Preserve context window capacity by avoiding long wait states
Implement deterministic polling strategies instead of blind retries
Surface granular failure states (processing, completed, failed) rather than opaque timeouts

The pattern is framework-agnostic. Whether orchestrating through Strands Agents, LangGraph, or custom LLM loops, the underlying contract remains identical: immediate ac

knowledgment, background execution, and explicit status resolution.

Core Solution

Implementing the asynchronous job handle pattern requires restructuring how MCP tools are designed and how agents consume them. The solution separates tool registration into two distinct operations: task initiation and status resolution.

Step 1: Define the Task Initiation Tool

The first tool accepts input parameters, generates a unique reference identifier, queues the work, and returns immediately. It never blocks on external I/O.

import { v4 as uuidv4 } from 'uuid';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';

// Production-grade state store interface
interface TaskRecord {
  id: string;
  status: 'pending' | 'processing' | 'completed' | 'failed';
  payload: Record<string, unknown>;
  result?: unknown;
  error?: string;
  createdAt: number;
}

const taskRegistry = new Map<string, TaskRecord>();

const server = new McpServer({
  name: 'async-orchestrator',
  version: '1.0.0'
});

server.tool(
  'initiate_background_task',
  'Submits a long-running operation and returns a tracking reference immediately.',
  {
    operation_type: { type: 'string', description: 'Target external service or pipeline' },
    parameters: { type: 'object', description: 'Payload for the external call' }
  },
  async ({ operation_type, parameters }) => {
    const taskId = uuidv4().slice(0, 8);
    
    const record: TaskRecord = {
      id: taskId,
      status: 'pending',
      payload: { operation_type, parameters },
      createdAt: Date.now()
    };

    taskRegistry.set(taskId, record);

    // Offload to background execution pool
    processTaskAsync(taskId, operation_type, parameters);

    return {
      content: [{ type: 'text', text: JSON.stringify({ task_ref: taskId, status: 'queued' }) }]
    };
  }
);

Step 2: Implement Background Execution

The heavy lifting runs outside the MCP request lifecycle. This ensures the tool call returns within milliseconds, well below the 7-second implicit threshold.

async function processTaskAsync(taskId: string, operationType: string, params: Record<string, unknown>) {
  const record = taskRegistry.get(taskId);
  if (!record) return;

  record.status = 'processing';
  taskRegistry.set(taskId, record);

  try {
    // Simulate external API call with variable latency
    const result = await executeExternalDependency(operationType, params);
    
    record.status = 'completed';
    record.result = result;
  } catch (err) {
    record.status = 'failed';
    record.error = err instanceof Error ? err.message : 'Unknown execution error';
  } finally {
    taskRegistry.set(taskId, record);
  }
}

async function executeExternalDependency(type: string, params: Record<string, unknown>) {
  // Replace with actual HTTP client, SDK, or queue worker
  await new Promise(res => setTimeout(res, 12000)); // Simulates 12s latency
  return { data: `Processed ${type} with ${JSON.stringify(params)}` };
}

Step 3: Expose Status Resolution Tool

Agents poll this endpoint until the task reaches a terminal state. The tool returns structured metadata, allowing the agent to decide whether to continue, retry, or surface an error.

server.tool(
  'resolve_task_status',
  'Retrieves the current state and result of a previously submitted task.',
  { task_ref: { type: 'string', description: 'Reference ID from initiate_background_task' } },
  async ({ task_ref }) => {
    const record = taskRegistry.get(task_ref);
    
    if (!record) {
      return { content: [{ type: 'text', text: JSON.stringify({ error: 'Task reference not found' }) }] };
    }

    const response = {
      task_ref: record.id,
      status: record.status,
      created_at: new Date(record.createdAt).toISOString(),
      result: record.result ?? null,
      error: record.error ?? null
    };

    return { content: [{ type: 'text', text: JSON.stringify(response) }] };
  }
);

Architecture Decisions & Rationale

Separation of Initiation and Resolution: MCP tools should be atomic. Combining submission and waiting violates the protocol's synchronous expectation and guarantees timeout failures.
Immediate Return Guarantee: The initiation tool must complete in <500ms. Any I/O, validation, or queuing must be non-blocking or deferred.
Explicit Terminal States: completed and failed are distinct. Agents need to know when to stop polling. Ambiguous states like running or active cause indefinite loops.
Stateless Tool Design: The tools themselves hold no memory. State lives in an external registry. This enables horizontal scaling and prevents node-specific failures.
Polling Over Webhooks: While webhooks reduce polling overhead, they require external routing, TLS termination, and firewall configuration. Polling is simpler, more reliable for internal agent loops, and aligns with MCP's request-response model.

Pitfall Guide

1. In-Memory State Volatility

Explanation: Storing task records in a local Map or dictionary works for development but fails in production. Server restarts, scaling events, or crashes erase all pending tasks, leaving agents with orphaned references. Fix: Persist task state to a distributed store (Redis, DynamoDB, PostgreSQL). Implement TTL policies to auto-expire stale records after 24-48 hours.

2. Polling Storms

Explanation: Agents that poll resolve_task_status every 100ms generate excessive network traffic and rate-limit the MCP server. This degrades performance for all concurrent workflows. Fix: Implement exponential backoff on the agent side. Start with 1s intervals, double up to 10s, then cap at 30s. Most external APIs complete within 15-60 seconds; aggressive polling yields diminishing returns.

3. Orphaned Background Jobs

Explanation: If the background execution function crashes or the process terminates, tasks remain stuck in processing indefinitely. Agents poll forever, consuming tokens and blocking user sessions. Fix: Add a heartbeat mechanism or maximum execution timeout. A background sweeper should mark tasks as failed if they exceed their SLA (e.g., 5 minutes). Log alerts for operational visibility.

4. Ignoring Terminal Failure States

Explanation: Returning null or empty strings on failure forces agents to guess whether the task is still running or has errored. This breaks deterministic workflow logic. Fix: Always return a structured payload with explicit status and error fields. Agents should branch logic based on status === 'failed' rather than parsing text responses.

5. Hardcoded Timeout Assumptions

Explanation: Assuming all external calls will finish within 10 seconds leads to brittle designs. Batch jobs, ML inference pipelines, and third-party rate limits vary wildly. Fix: Make SLAs configurable per operation type. Pass max_wait_seconds in the initiation payload and enforce it in the background worker. Return structured timeout errors when exceeded.

6. Lack of Idempotency

Explanation: Network retries or agent loops may call initiate_background_task multiple times for the same logical operation, spawning duplicate jobs and wasting compute. Fix: Accept an idempotency_key in the initiation payload. Check the registry before creating a new record. If the key exists, return the existing task_ref instead of spawning a duplicate.

7. Overloading the LLM with Polling Logic

Explanation: Prompting the LLM to manually write polling loops wastes context tokens and introduces non-deterministic behavior. The model may forget to poll, poll too aggressively, or misinterpret status strings. Fix: Use agent framework primitives (e.g., Strands MCPClient, LangGraph ToolNode) to handle polling automatically. The tool should only return status; the orchestrator manages the retry loop.

Production Bundle

Action Checklist

Replace in-memory task registry with a persistent, distributed store (Redis/DynamoDB)
Implement exponential backoff polling strategy on the agent orchestrator side
Add maximum execution timeout and background sweeper for orphaned jobs
Standardize response schema with explicit status, result, and error fields
Configure per-operation SLAs instead of hardcoding global timeouts
Enforce idempotency keys on task initiation to prevent duplicate work
Delegate polling logic to agent framework primitives, not LLM prompts
Add structured logging and metrics (task duration, failure rate, queue depth)

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Fast lookup (<2s)	Synchronous direct call	Minimal latency, simpler code, no state management overhead	Lowest (no queue/registry)
Unpredictable external API (2-30s)	Async HandleId	Prevents agent freeze, enables graceful degradation, scales horizontally	Medium (state store + background workers)
Batch processing / ML inference (>30s)	Async HandleId + Webhook callback	Polling becomes inefficient; webhooks push results when ready	Higher (infrastructure + routing)
Critical path with strict SLA	Async HandleId + Circuit Breaker	Fails fast on degradation, preserves agent stability	Medium (monitoring + fallback logic)

Configuration Template

// mcp-async-config.ts
export const MCP_ASYNC_CONFIG = {
  server: {
    name: 'production-async-tools',
    version: '2.1.0',
    transport: 'stdio' // or 'sse' for remote
  },
  state: {
    provider: 'redis',
    ttlSeconds: 86400,
    keyPrefix: 'mcp_task:'
  },
  execution: {
    maxConcurrency: 50,
    defaultTimeoutSeconds: 120,
    retryAttempts: 0 // Handled by agent orchestrator
  },
  polling: {
    initialIntervalMs: 1000,
    maxIntervalMs: 30000,
    backoffMultiplier: 2.0,
    maxAttempts: 20
  },
  observability: {
    metricsPrefix: 'mcp_async',
    logLevel: 'info',
    alertOnFailureRate: 0.05 // 5%
  }
};

Quick Start Guide

Initialize the MCP Server: Install @modelcontextprotocol/sdk and redis. Configure the state provider in MCP_ASYNC_CONFIG. Start the server process locally or deploy to a container.
Register Tools: Export initiate_background_task and resolve_task_status using the server SDK. Ensure the initiation tool returns within 200ms.
Configure Agent Polling: In your agent framework (Strands, LangGraph, etc.), set the polling interval to 1s with exponential backoff. Map status === 'completed' to workflow continuation and status === 'failed' to error handling.
Validate with Synthetic Load: Run a load test simulating 50 concurrent tasks with 10-15s latency. Verify no 424 errors, state persistence across restarts, and correct backoff behavior.
Deploy with Observability: Enable metrics collection for queue depth, task duration, and failure rates. Set alerts for failure rate >5% or average duration >SLA. Monitor agent token consumption to confirm context window efficiency gains.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back