Architecting Infrastructure Control Planes with the Model Context Protocol

Current Situation Analysis

Infrastructure management has historically been constrained by rigid, single-purpose interfaces. Hosting dashboards, CLI utilities, and REST endpoints were designed for human operators or deterministic scripts, not for autonomous reasoning engines. As teams scale to hundreds of domains, mailboxes, and hosting instances, the operational friction compounds. A routine task like auditing DNS zones, verifying SSL expiration windows, and provisioning redirect rules requires chaining three to five distinct API calls. Each call demands specific authentication headers, payload shapes, and error handling logic.

This problem is frequently misunderstood because developers assume LLMs can simply "read the API documentation" and execute HTTP requests. In practice, raw endpoints lack the semantic boundaries required for safe composition. LLMs operate on probabilistic token generation, not deterministic state machines. Without explicit tool definitions, side-effect boundaries, and structured return schemas, autonomous infrastructure operations become unreliable and dangerous.

The Model Context Protocol (MCP) addresses this gap by translating infrastructure capabilities into a typed, discoverable catalog. Instead of exposing raw HTTP routes, MCP servers expose tools with explicit input schemas, side-effect annotations, and normalized outputs. This shifts the cognitive load from the LLM to the tool author, who enforces safety, pagination, rate limiting, and error consistency. The result is a control plane where natural language commands are safely translated into verified, composable infrastructure actions.

WOW Moment: Key Findings

The transition from raw API scripting to MCP tool composition fundamentally changes how infrastructure automation behaves under load and ambiguity. The following comparison highlights the operational shift:

Approach	Composition Overhead	Safety Guarantees	Error Normalization	Rate Limit Handling
Raw API Scripts	High (manual chaining)	None (LLM guesses)	Fragmented per endpoint	Manual backoff
MCP Tool Layer	Low (semantic catalog)	Explicit (hints + guards)	Unified schema	Built-in throttling

This finding matters because it decouples infrastructure complexity from AI reasoning. When tools are properly bounded, LLMs can safely orchestrate multi-step workflows across dozens of resources without hallucinating endpoints or violating provider limits. The confirmation dance, transparent pagination, and unified error shapes transform fragile scripts into production-grade automation pipelines.

Core Solution

Building an MCP server for infrastructure management requires a disciplined architecture that prioritizes safety, predictability, and transport compatibility. The implementation follows a five-layer design: schema definition, transport routing, request orchestration, safety enforcement, and error normalization.

1. Schema Definition with Zod

Every tool begins with a strict input schema. Zod provides runtime validation and compiles cleanly to JSON Schema, which MCP clients consume.

import { z } from 'zod';

const DeprovisionHostingInput = z.object({
  hostingId: z.number().int().positive(),
  reason: z.string().min(3).max(255),
  confirmationToken: z.string().optional()
});

type DeprovisionHostingInput = z.infer<typeof DeprovisionHostingInput>;

2. Transport Layer Configuration

MCP servers communicate via stdio by default. The transport layer must strictly separate JSON-RPC messages from diagnostic output.

import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';

const server = new McpServer({
  name: 'infra-control-plane',
  version: '1.0.0'
});

const transport = new StdioServerTransport();
await server.connect(transport);

3. Request Orchestration & Throttling

Provider APIs enforce strict rate limits. A sliding window limiter prevents 429 responses during bulk operations.

class SlidingWindowLimiter {
  private timestamps: number[] = [];
  private readonly maxRequests: number;
  private readonly windowMs: number;

  constructor(maxRequests: number, windowMs: number) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
  }

  async acquire(): Promise<void> {
    const now = Date.now();
    this.timestamps = this.timestamps.filter(t => now - t < this.windowMs);
    
    if (this.timestamps.length >= this.maxRequests) {
      const oldest = this.timestamps[0];
      const waitTime = this.windowMs - (now - oldest) + 50;
      await new Promise(resolve => setTimeout(resolve, waitTime));
      return this.acquire();
    }
    
    this.timestamps.push(now);
  }
}

const apiLimiter = new SlidingWindowLimiter(60, 60_000);

4. Safety Enforcement: Two-Step Confirmation

Destructive operations require explicit human or LLM verification. The first invocation returns a token and impact summary; the second executes.

class ConfirmationGuard {
  private tokens = new Map<string, { expires: number; payload: unknown }>();

  generate(payload: unknown): string {
    const token = crypto.randomUUID();
    this.tokens.set(token, { expires: Date.now() + 60_000, payload });
    return token;
  }

  validate(token: string): boolean {
    const entry = this.tokens.get(token);
    if (!entry || Date.now() > entry.expires) {
      this.tokens.delete(token);
      return false;
    }
    this.tokens.delete(token);
    return true;
  }
}

const safetyGuard = new ConfirmationGuard();

5. Error Normalization

Provider APIs return inconsistent error shapes. A unified adapter ensures the LLM receives predictable failure contexts.

class InfraError extends Error {
  constructor(
    public readonly kind: 'auth' | 'rate_limit' | 'not_found' | 'validation' | 'unknown',
    public readonly code: string,
    public readonly message: string,
    public readonly raw: unknown
  ) {
    super(message);
    this.name = 'InfraError';
  }
}

function normalizeProviderResponse(raw: unknown): never {
  if (typeof raw === 'object' && raw !== null) {
    const obj = raw as Record<string, unknown>;
    if ('error' in obj) {
      const err = obj.error as Record<string, unknown>;
      throw new InfraError('unknown', String(err.code ?? 'ERR_001'), String(err.message ?? 'Unknown error'), raw);
    }
    if ('errors' in obj && Array.isArray(obj.errors)) {
      const first = obj.errors[0] as Record<string, unknown>;
      throw new InfraError('validation', String(first.code ?? 'ERR_002'), String(first.description ?? 'Validation failed'), raw);
    }
  }
  throw new InfraError('unknown', 'ERR_PARSE', 'Unexpected response shape', raw);
}

Architecture Rationale

Zod over raw JSON Schema: Runtime validation catches malformed inputs before they reach the provider API, reducing wasted requests and improving LLM feedback loops.
Sliding window over token bucket: Sliding windows align better with HTTP rate limit headers and prevent burst accumulation during long-running audits.
Two-step confirmation: MCP tool annotations (destructiveHint) are advisory only. Explicit token validation enforces safety regardless of client implementation.
Stdio transport: Eliminates network complexity for local AI assistants. Diagnostic output is strictly routed to stderr to preserve JSON-RPC integrity.

Pitfall Guide

1. Logging to Standard Output

Explanation: The stdio transport multiplexes JSON-RPC messages and arbitrary writes on stdout. Any console.log or unhandled exception stack trace corrupts the protocol stream, causing client disconnections. Fix: Route all diagnostics to stderr using a structured logger configured with destination: process.stderr. Never write to stdout outside the MCP SDK.

2. Exposing Pagination to the LLM

Explanation: Provider APIs paginate inconsistently (25, 50, or 100 items). If the tool returns a single page, the LLM reasons over incomplete data and misses resources. Fix: Implement transparent auto-pagination within the tool. Fetch all pages sequentially, merge results, and return a single normalized array.

3. Assuming LLMs Respect Destructive Hints

Explanation: MCP annotations like destructiveHint or idempotentHint are metadata. Clients may ignore them, and LLMs will execute destructive calls if context suggests cleanup. Fix: Implement a two-step confirmation flow. Return a temporary token and impact summary on the first call. Require the token on the second call to execute.

4. JSON Schema Version Drift

Explanation: zod-to-json-schema defaults to Draft 7. Some MCP clients enforce Draft 2020-12 and reject Draft 4 syntax like exclusiveMinimum: true. Fix: Explicitly target jsonSchema7 or draft2020-12 in the schema generator configuration. Validate output against the strictest client you support.

5. Ignoring API Response Shape Variance

Explanation: Public APIs return {error: {code, message}}, while private manager endpoints return {"errors": [{code, description}]} or raw HTML on auth failure. Fix: Build a normalization layer that catches all response shapes and throws a unified error class. Include the raw payload for debugging without leaking it to the LLM.

6. Session Decay in Cookie-Based Auth

Explanation: Manager-private APIs often rely on session cookies that expire every few hours. Hardcoded cookies cause silent failures during long-running workflows. Fix: Implement a refresh hook that detects 401/403 responses, triggers a cookie re-extraction routine, and retries the request. Cache tokens in memory only.

7. Over-Fetching Without Idempotency Checks

Explanation: LLMs may retry operations on locked or already-deleted resources if the tool doesn't surface state metadata. Fix: Include resource state flags (is_locked, status, expires_at) in list responses. Let the LLM filter before calling mutation tools.

Production Bundle

Action Checklist

Define strict Zod schemas for every tool input and output
Route all logging to stderr; never pollute stdout
Implement transparent pagination inside list tools
Add two-step confirmation guards for destructive operations
Normalize all provider error shapes into a unified class
Configure explicit JSON Schema version targeting
Add sliding window rate limiting aligned with provider caps
Include resource state metadata in read operations

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local AI assistant integration	Stdio transport	Zero network overhead, simpler debugging	None
Remote or multi-client access	SSE or Streamable HTTP	Enables CORS, authentication, and scaling	Requires reverse proxy
Read-only auditing workflows	Bearer token auth	Provider-native, long-lived, auditable	Low
Write operations on legacy dashboards	Cookie extraction + refresh hooks	Bypasses undocumented private endpoints	Medium (maintenance)
High-frequency bulk operations	Sliding window limiter	Prevents 429s during pagination chains	Slight latency increase
Destructive infrastructure changes	Two-step confirmation	Prevents accidental production deletion	None

Configuration Template

// server.ts
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
import { SlidingWindowLimiter } from './throttle.js';
import { ConfirmationGuard } from './safety.js';
import { InfraError, normalizeProviderResponse } from './errors.js';

const server = new McpServer({ name: 'infra-control-plane', version: '1.0.0' });
const transport = new StdioServerTransport();
const limiter = new SlidingWindowLimiter(60, 60_000);
const guard = new ConfirmationGuard();

server.tool(
  'list_hostings',
  'Retrieve all hosting instances with pagination handled internally',
  { page: z.number().optional() },
  async () => {
    await limiter.acquire();
    try {
      const response = await fetch('https://api.provider.com/v3/hostings', {
        headers: { Authorization: `Bearer ${process.env.PROVIDER_TOKEN}` }
      });
      if (!response.ok) normalizeProviderResponse(await response.json());
      const data = await response.json();
      return { content: [{ type: 'text', text: JSON.stringify(data) }] };
    } catch (err) {
      if (err instanceof InfraError) {
        return { content: [{ type: 'text', text: `Error: ${err.message}` }], isError: true };
      }
      throw err;
    }
  }
);

server.tool(
  'deprovision_hosting',
  'Permanently remove a hosting instance (requires confirmation)',
  { hostingId: z.number(), reason: z.string(), confirmationToken: z.string().optional() },
  async ({ hostingId, reason, confirmationToken }) => {
    if (!confirmationToken) {
      const token = guard.generate({ hostingId, reason });
      return {
        content: [{
          type: 'text',
          text: `Confirmation required. Token: ${token}. Expires in 60s. Hosting ${hostingId} will be deleted.`
        }]
      };
    }
    if (!guard.validate(confirmationToken)) {
      return { content: [{ type: 'text', text: 'Invalid or expired confirmation token.' }], isError: true };
    }
    await limiter.acquire();
    // Execute deletion...
    return { content: [{ type: 'text', text: `Hosting ${hostingId} deprovisioned.` }] };
  }
);

await server.connect(transport);

Quick Start Guide

Initialize a Node 18+ project with ESM support and install @modelcontextprotocol/sdk, zod, and your HTTP client.
Define your tool schemas using Zod. Compile them to JSON Schema with explicit version targeting.
Implement the transport layer using StdioServerTransport. Route all diagnostics to stderr.
Add rate limiting, pagination handling, and confirmation guards before wiring tool handlers.
Run node server.ts and connect your MCP client. Verify tool discovery, test a read operation, then validate the confirmation flow on a destructive call.

Building an MCP server for a Swiss hosting provider (and what reverse-engineering its manager taught me)