a precise description, as the host LLM uses this metadata to determine routing decisions.
# capability_server.py
from mcp.server.fastmcp import FastMCP
import json
server = FastMCP("enterprise-capabilities")
@server.tool()
def fetch_metrics(dataset_id: str, timeframe: str) -> str:
"""Retrieve aggregated performance metrics for a specific dataset and timeframe.
Returns JSON-formatted statistics including mean, median, and variance."""
# Simulated data retrieval
payload = {
"dataset": dataset_id,
"period": timeframe,
"metrics": {"mean": 42.8, "median": 41.2, "variance": 3.1}
}
return json.dumps(payload)
@server.tool()
def generate_summary(raw_data: str, format_type: str) -> str:
"""Transform raw input data into a structured summary.
Supported formats: 'bullet', 'paragraph', 'json'."""
if format_type == "json":
return json.dumps({"status": "summarized", "length": len(raw_data)})
return f"Processed {len(raw_data)} characters into {format_type} format."
@server.tool()
def validate_config(config_blob: str) -> str:
"""Check configuration payload against schema rules.
Returns validation status and error details if applicable."""
try:
json.loads(config_blob)
return json.dumps({"valid": True, "errors": []})
except json.JSONDecodeError as e:
return json.dumps({"valid": False, "errors": [str(e)]})
if __name__ == "__main__":
server.run(transport="stdio")
Architecture Rationale: FastMCP abstracts JSON-RPC message formatting while preserving strict type boundaries. Tools return strings because the protocol serializes all responses as TextContent. Descriptions are engineered for LLM consumption, not human readability. The server runs as an independent process, guaranteeing memory isolation and independent deployment cycles.
MCP supports two primary transport layers. Local development uses stdio (subprocess stdin/stdout), while distributed environments require HTTP + SSE (Server-Sent Events). The client must manage connection initialization, capability discovery, and graceful teardown.
# agent_host.py
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_core.messages import HumanMessage
async def establish_capability_catalog():
server_config = StdioServerParameters(
command="python",
args=["capability_server.py"],
env={"PYTHONUNBUFFERED": "1"}
)
async with stdio_client(server_config) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
# Dynamic discovery replaces hardcoded imports
catalog = await session.list_tools()
print(f"Available capabilities: {[t.name for t in catalog.tools]}")
# Direct invocation example (bypassing LLM routing)
result = await session.call_tool(
name="fetch_metrics",
arguments={"dataset_id": "prod-04", "timeframe": "24h"}
)
print(f"Direct call result: {result.content[0].text}")
return catalog
if __name__ == "__main__":
asyncio.run(establish_capability_catalog())
Architecture Rationale: ClientSession handles JSON-RPC request/response correlation. list_tools() executes once per connection, caching the capability catalog. Direct call_tool() demonstrates protocol-level invocation without LLM mediation, useful for deterministic workflows. The PYTHONUNBUFFERED environment variable prevents stdio blocking in subprocess communication.
Step 3: Integrate with Agent Frameworks
LangChain provides an official adapter that converts MCP tool schemas into framework-native objects. The adapter handles schema translation, argument serialization, and async execution routing.
# langchain_integration.py
import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
async def deploy_agent_with_mcp():
llm = ChatOpenAI(model="gpt-4o", temperature=0)
mcp_client = MultiServerMCPClient(
{
"metrics_service": {
"command": "python",
"args": ["capability_server.py"],
"transport": "stdio"
}
}
)
# Adapter translates MCP schemas to LangChain Tool objects
mcp_tools = await mcp_client.get_tools()
agent = create_react_agent(model=llm, tools=mcp_tools)
query = "Fetch metrics for dataset prod-04 over 24h, then validate this config: {invalid json"
response = await agent.ainvoke({"messages": [HumanMessage(content=query)]})
print(response["messages"][-1].content)
if __name__ == "__main__":
asyncio.run(deploy_agent_with_mcp())
Architecture Rationale: MultiServerMCPClient manages connection pooling across multiple capability servers. The adapter automatically maps MCP Tool definitions to LangChain's BaseTool interface. Crucially, MCP enforces an async execution contract. Synchronous invoke() calls will raise NotImplementedError because the underlying JSON-RPC transport requires event loop coordination. Using ainvoke() ensures proper backpressure handling and prevents event loop blocking.
Pitfall Guide
1. Synchronous Invocation Trap
Explanation: Developers accustomed to traditional function calling attempt to use agent.invoke() with MCP tools. The protocol's JSON-RPC transport requires async coordination, causing immediate runtime failures.
Fix: Always use agent.ainvoke() or wrap calls in asyncio.run(). If synchronous execution is mandatory, use loop.run_until_complete() at the application boundary, never inside agent loops.
2. Vague Capability Descriptions
Explanation: LLMs rely on tool descriptions for routing decisions. Generic descriptions like "process data" cause incorrect tool selection or hallucinated arguments.
Fix: Engineer descriptions with explicit input/output contracts, format specifications, and usage constraints. Include examples of expected payloads and clarify what the tool does not handle.
3. Blocking stdio Streams
Explanation: Subprocess communication via stdio fails when the server process buffers output or waits for interactive input. The client hangs indefinitely waiting for JSON-RPC responses.
Fix: Set PYTHONUNBUFFERED=1 or equivalent flags. Avoid input() calls in server code. Use explicit sys.stdout.flush() after JSON-RPC message emission.
4. Unhandled Server Exceptions
Explanation: Uncaught exceptions in capability functions crash the subprocess, terminating the JSON-RPC connection. The client receives a broken pipe error instead of a structured failure.
Fix: Wrap capability logic in try/except blocks. Return descriptive error strings or structured JSON error payloads. Never let exceptions propagate to the protocol layer.
5. Transport Mismatch in Production
Explanation: Teams develop with stdio but deploy to distributed environments without switching to HTTP + SSE. Subprocess spawning fails in containerized or serverless environments.
Fix: Abstract transport selection behind environment configuration. Use stdio for local development and HTTP + SSE with authentication for production. Validate transport availability during startup.
6. Schema Over-Engineering
Explanation: Developers create complex nested JSON schemas for capability arguments. LLMs struggle to generate valid payloads, leading to repeated validation failures and token waste.
Fix: Flatten schemas where possible. Use primitive types (string, number, boolean) and simple arrays. Delegate complex parsing to the capability implementation, not the LLM.
7. Ignoring Connection Lifecycle
Explanation: Clients open connections but never close them, causing resource leaks. Long-running agents accumulate zombie subprocesses or exhausted HTTP connections.
Fix: Implement explicit connection teardown using async with context managers or try/finally blocks. Monitor subprocess counts and connection pools in production observability dashboards.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single agent, rapid prototype | Traditional function binding | Zero protocol overhead, faster iteration | Low initial, high scaling cost |
| Multi-agent team, shared capabilities | MCP with stdio transport | Centralized updates, dynamic discovery | Moderate infrastructure, low maintenance |
| Distributed microservices, cross-language | MCP with HTTP + SSE | Network isolation, language agnosticism | Higher latency, requires auth/network config |
| High-frequency deterministic calls | Direct call_tool() bypassing LLM | Eliminates routing latency and token cost | Requires explicit workflow orchestration |
| Security-sensitive operations | MCP with allowlists + human-in-the-loop | Process boundary + explicit confirmation | Adds workflow steps, reduces automation speed |
Configuration Template
# mcp-deployment.yaml
server:
name: "capability-orchestrator"
transport: "stdio" # Switch to "http_sse" for production
command: "python"
args: ["-u", "capability_server.py"]
env:
PYTHONUNBUFFERED: "1"
LOG_LEVEL: "INFO"
client:
timeout_seconds: 30
max_retries: 2
retry_backoff: "exponential"
connection_pool_size: 5
observability:
metrics:
- tool_invocation_count
- response_latency_ms
- error_rate_by_tool
logging:
format: "json"
level: "INFO"
capture_args: false # Prevent sensitive data leakage
Quick Start Guide
- Initialize Server: Create a new Python file, import
FastMCP, register 2-3 capabilities with precise descriptions, and run with stdio transport.
- Test Discovery: Write a client script using
StdioServerParameters and ClientSession. Call list_tools() and verify the catalog matches server registrations.
- Validate Invocation: Execute
call_tool() with sample arguments. Confirm JSON-RPC response formatting and error handling behavior.
- Integrate Agent: Install
langchain-mcp-adapters, configure MultiServerMCPClient, and attach tools to a create_react_agent instance. Use ainvoke() for execution.
- Deploy & Monitor: Switch transport to
HTTP + SSE for production, configure authentication headers, and instrument connection lifecycle metrics before routing live traffic.