tics use case that exposes infrastructure metrics, configuration resources, and incident reporting templates.
Architecture Decisions and Rationale
- Transport Selection: Streamable HTTP is chosen over stdio for networked deployments. HTTP enables load balancing, reverse proxy integration, and multi-client connectivity. The protocol uses Server-Sent Events (SSE) for bidirectional messaging, allowing the server to push progress updates and the client to send tool requests over a single persistent connection.
- Framework Choice: FastMCP (Python) abstracts JSON-RPC 2.0 serialization, schema validation, and transport routing. It provides decorator-based tool/resource/prompt registration, automatic OpenAPI-compatible schema generation, and built-in HTTP server utilities. For edge deployments or TypeScript-heavy stacks, the official TypeScript SDK offers equivalent functionality with stricter type safety.
- Session Management: MCP maintains explicit session state. The server must handle
initialize, initialized, and session lifecycle events. FastMCP manages this automatically, but production deployments should implement session cleanup and timeout handling.
- Schema Validation: All tool inputs and outputs are validated against JSON Schema. FastMCP infers schemas from Python type hints and docstrings. Explicit schema definitions are recommended for complex nested structures to prevent client-side parsing errors.
Implementation
The following implementation demonstrates a Streamable HTTP MCP server exposing system diagnostics capabilities. The code uses distinct naming conventions, modular structure, and production-ready error handling.
# diagnostics_server.py
import asyncio
import logging
import os
import platform
import psutil
from typing import Dict, List, Optional
from fastmcp import FastMCP
# Configure structured logging for production observability
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s"
)
logger = logging.getLogger("mcp.diagnostics")
# Initialize server instance with explicit transport configuration
diagnostics_mcp = FastMCP(
name="Infrastructure Diagnostics Server",
instructions="Provides system metrics, configuration resources, and incident reporting templates for AI agents."
)
# Tool: Retrieve real-time system performance metrics
@diagnostics_mcp.tool()
async def fetch_system_metrics(
include_network: bool = True,
include_disk: bool = True
) -> Dict[str, object]:
"""
Collects CPU, memory, network, and disk utilization data.
Args:
include_network: Toggle network interface statistics collection.
include_disk: Toggle disk partition usage collection.
Returns:
Dictionary containing normalized system metrics.
"""
logger.info("Collecting system metrics | network=%s, disk=%s", include_network, include_disk)
metrics: Dict[str, object] = {
"platform": platform.system(),
"cpu_percent": psutil.cpu_percent(interval=1),
"memory": {
"total_gb": round(psutil.virtual_memory().total / (1024**3), 2),
"available_gb": round(psutil.virtual_memory().available / (1024**3), 2),
"usage_percent": psutil.virtual_memory().percent
}
}
if include_disk:
metrics["disk"] = {
partition.device: {
"total_gb": round(partition.total / (1024**3), 2),
"used_gb": round(partition.used / (1024**3), 2),
"percent": partition.percent
}
for partition in psutil.disk_partitions(all=False)
}
if include_network:
net_io = psutil.net_io_counters()
metrics["network"] = {
"bytes_sent_mb": round(net_io.bytes_sent / (1024**2), 2),
"bytes_recv_mb": round(net_io.bytes_recv / (1024**2), 2)
}
return metrics
# Tool: Execute safe diagnostic commands with timeout enforcement
@diagnostics_mcp.tool()
async def run_network_diagnostic(
target_host: str,
packet_count: int = 4,
timeout_seconds: int = 10
) -> Dict[str, object]:
"""
Performs a controlled network reachability check.
Args:
target_host: IP address or hostname to probe.
packet_count: Number of probe packets to send.
timeout_seconds: Maximum execution duration.
Returns:
Diagnostic results including latency and packet loss.
"""
logger.info("Running network diagnostic | target=%s, packets=%d", target_host, packet_count)
try:
# Simulated diagnostic execution for demonstration
# In production, replace with subprocess.run() or async network library
await asyncio.sleep(0.5) # Simulate network latency
return {
"target": target_host,
"packets_sent": packet_count,
"packets_received": packet_count,
"packet_loss_percent": 0.0,
"avg_latency_ms": 12.4,
"status": "reachable"
}
except Exception as exc:
logger.error("Diagnostic failed for %s: %s", target_host, exc)
return {
"target": target_host,
"status": "unreachable",
"error": str(exc)
}
# Resource: Expose environment configuration as read-only data
@diagnostics_mcp.resource("config://environment")
async def load_environment_config() -> str:
"""Returns current deployment environment variables and runtime settings."""
config_data = {
"runtime": platform.python_version(),
"hostname": platform.node(),
"env_vars": {
key: os.getenv(key, "<not set>")
for key in ["APP_ENV", "LOG_LEVEL", "MAX_METRICS_AGE"]
}
}
return str(config_data)
# Prompt: Generate structured incident reports from raw metrics
@diagnostics_mcp.prompt()
def format_incident_report(
severity: str,
affected_service: str,
raw_metrics: str
) -> str:
"""
Constructs a standardized incident summary for alerting systems.
Args:
severity: Critical, Warning, or Info.
affected_service: Name of the impacted component.
raw_metrics: JSON string of collected diagnostic data.
Returns:
Formatted incident report template.
"""
return (
f"## Incident Report\n"
f"- **Severity**: {severity}\n"
f"- **Service**: {affected_service}\n"
f"- **Timestamp**: {asyncio.get_event_loop().time()}\n"
f"- **Metrics**: {raw_metrics}\n"
f"- **Recommended Action**: Review resource utilization and scale horizontally if thresholds exceeded."
)
# HTTP transport configuration and startup
if __name__ == "__main__":
import uvicorn
from fastmcp.server.transports import StreamableHttpTransport
# Bind to configurable host/port for containerized deployments
host = os.getenv("MCP_HTTP_HOST", "0.0.0.0")
port = int(os.getenv("MCP_HTTP_PORT", "8080"))
logger.info("Starting Streamable HTTP MCP server on %s:%d", host, port)
# FastMCP handles JSON-RPC 2.0 routing and SSE streaming automatically
diagnostics_mcp.run(
transport="streamable-http",
host=host,
port=port,
log_level="info"
)
Why This Architecture Works
The implementation separates concerns cleanly: tools handle executable logic, resources expose static configuration, and prompts standardize output formatting. FastMCP's decorator system automatically generates JSON Schema definitions from type hints and docstrings, ensuring clients receive accurate parameter specifications. The Streamable HTTP transport leverages SSE for persistent connections, enabling progress callbacks and bidirectional messaging without polling. Environment-driven configuration (MCP_HTTP_HOST, MCP_HTTP_PORT) ensures compatibility with container orchestration platforms and reverse proxies.
Pitfall Guide
1. Blocking the Event Loop in HTTP Handlers
Explanation: Synchronous I/O operations (file reads, network calls, subprocess execution) inside tool functions will block the async event loop, causing SSE connection timeouts and dropped client sessions.
Fix: Always use async def for tool implementations. Wrap blocking operations in asyncio.to_thread() or replace with native async libraries (aiohttp, asyncpg, httpx).
2. Ignoring Transport-Specific Constraints
Explanation: stdio and Streamable HTTP handle message framing differently. stdio relies on newline-delimited JSON, while HTTP uses SSE streams with explicit event types. Mixing transport assumptions causes parsing failures.
Fix: Validate transport behavior during development. Use fastmcp.test() for stdio simulation and explicit HTTP clients for network testing. Never hardcode transport assumptions in business logic.
3. Overlooking JSON Schema Validation Boundaries
Explanation: FastMCP infers schemas from type hints, but complex nested structures, optional fields, or custom enums may generate incomplete schemas. Clients will reject malformed inputs or fail to serialize responses.
Fix: Explicitly define Field constraints using Pydantic or typing.Annotated. Document edge cases in docstrings. Validate schemas against the MCP specification before deployment.
4. Treating MCP Sessions as Stateless
Explanation: MCP maintains explicit session state for capability negotiation and progress tracking. Assuming statelessness leads to lost context, duplicate tool registrations, or broken streaming connections.
Fix: Implement session lifecycle hooks (on_connect, on_disconnect). Store session metadata in a distributed cache (Redis) for horizontally scaled deployments. Clean up stale sessions with TTL policies.
Explanation: Embedding API keys, database credentials, or tokens directly in tool code exposes them in schema documentation and client logs. AI hosts may inadvertently leak secrets in conversation history.
Fix: Use environment variables or secret managers (HashiCorp Vault, AWS Secrets Manager). Pass credentials via resource URIs or secure headers. Never include secrets in tool descriptions or return values.
Explanation: Tools executing for >5 seconds without progress updates cause client timeouts and poor UX. MCP supports progress notifications, but developers often omit them.
Fix: Implement yield-based progress reporting or use mcp.session.send_progress(). Update clients at logical checkpoints (e.g., 25%, 50%, 75%, 100%). Set reasonable timeouts in client configurations.
7. Mismanaging Resource URI Schemes
Explanation: Resource URIs must follow scheme://authority/path conventions. Invalid schemes or missing authority components break client resolution and caching mechanisms.
Fix: Use standardized schemes (config://, data://, file://). Validate URIs against RFC 3986. Implement resource versioning via query parameters (config://env?v=2) to support cache invalidation.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Local AI development & testing | stdio transport with FastMCP (Python) | Zero network overhead, simple process management, ideal for IDE integrations | Minimal; runs on developer machines |
| Multi-client cloud deployment | Streamable HTTP with reverse proxy | Enables load balancing, authentication, and scalable session management | Moderate; requires infrastructure (LB, TLS, monitoring) |
| Edge/Serverless deployment | TypeScript MCP SDK on Cloudflare/Vercel | Native HTTP support, cold-start optimization, global distribution | Higher; platform-specific pricing, but reduces egress costs |
| High-throughput metric collection | Async tools with connection pooling | Prevents event loop blocking, maintains SSE stability under load | Low; requires async library investment |
| Strict compliance environments | Resource-based credential injection | Keeps secrets out of tool schemas and conversation logs | Moderate; requires secret manager integration |
Configuration Template
# docker-compose.yml
version: "3.9"
services:
mcp-diagnostics:
build: .
ports:
- "8080:8080"
environment:
- MCP_HTTP_HOST=0.0.0.0
- MCP_HTTP_PORT=8080
- LOG_LEVEL=info
- APP_ENV=production
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
deploy:
resources:
limits:
memory: 512M
cpus: "0.5"
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
# requirements.txt
fastmcp>=2.0.0
psutil>=5.9.0
uvicorn>=0.27.0
pydantic>=2.0.0
Quick Start Guide
- Initialize Project: Create a virtual environment and install dependencies:
python -m venv .venv && source .venv/bin/activate && pip install fastmcp psutil uvicorn
- Scaffold Server: Save the
diagnostics_server.py implementation to your project root. Ensure type hints and docstrings match your target capabilities.
- Launch Transport: Run
python diagnostics_server.py. The server binds to 0.0.0.0:8080 and exposes Streamable HTTP endpoints with automatic SSE streaming.
- Connect Client: Configure your AI host (Claude Desktop, Cursor, or custom MCP client) to point to
http://localhost:8080/mcp. Verify tool discovery by invoking tools/list and testing metric collection.