Building an MCP server for a Swiss hosting provider (and what reverse-engineering its manager taught me)
Architecting Infrastructure Control Planes with the Model Context Protocol
Current Situation Analysis
Infrastructure management has historically been constrained by rigid, single-purpose interfaces. Hosting dashboards, CLI utilities, and REST endpoints were designed for human operators or deterministic scripts, not for autonomous reasoning engines. As teams scale to hundreds of domains, mailboxes, and hosting instances, the operational friction compounds. A routine task like auditing DNS zones, verifying SSL expiration windows, and provisioning redirect rules requires chaining three to five distinct API calls. Each call demands specific authentication headers, payload shapes, and error handling logic.
This problem is frequently misunderstood because developers assume LLMs can simply "read the API documentation" and execute HTTP requests. In practice, raw endpoints lack the semantic boundaries required for safe composition. LLMs operate on probabilistic token generation, not deterministic state machines. Without explicit tool definitions, side-effect boundaries, and structured return schemas, autonomous infrastructure operations become unreliable and dangerous.
The Model Context Protocol (MCP) addresses this gap by translating infrastructure capabilities into a typed, discoverable catalog. Instead of exposing raw HTTP routes, MCP servers expose tools with explicit input schemas, side-effect annotations, and normalized outputs. This shifts the cognitive load from the LLM to the tool author, who enforces safety, pagination, rate limiting, and error consistency. The result is a control plane where natural language commands are safely translated into verified, composable infrastructure actions.
WOW Moment: Key Findings
The transition from raw API scripting to MCP tool composition fundamentally changes how infrastructure automation behaves under load and ambiguity. The following comparison highlights the operational shift:
| Approach | Composition Overhead | Safety Guarantees | Error Normalization | Rate Limit Handling |
|---|---|---|---|---|
| Raw API Scripts | High (manual chaining) | None (LLM guesses) | Fragmented per endpoint | Manual backoff |
| MCP Tool Layer | Low (semantic catalog) | Explicit (hints + guards) | Unified schema | Built-in throttling |
This finding matters because it decouples infrastructure complexity from AI reasoning. When tools are properly bounded, LLMs can safely orchestrate multi-step workflows across dozens of resources without hallucinating endpoints or violating provider limits. The confirmation dance, transparent pagination, and unified error shapes transform fragile scripts into production-grade automation pipelines.
Core Solution
Building an MCP server for infrastructure management requires a disciplined architecture that prioritizes safety, predictability, and transport compatibility. The implementation follows a five-layer design: schema definition, transport routing, request orchestration, safety enforcement, and error normalization.
1. Schema Definition with Zod
Every tool begins with a strict input schema. Zod provides runtime validation and compiles cleanly to JSON Schema, which MCP clients consume.
import { z } from 'zod';
const DeprovisionHostingInput = z.object({
hostingId: z.number().int().positive(),
reason: z.string().min(3).max(255),
confirmationToken: z.string().optional()
});
type DeprovisionHostingInput = z.infer<typeof DeprovisionHostingInput>;
2. Transport Layer Configuration
MCP servers communicate via stdio by default. The transport layer must strictly separate JSON-RPC messages from diagnostic output.
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
const server = new McpServer({
name: 'infra-control-plane',
version: '1.0.0'
});
const transport = new StdioServerTransport();
await server.connect(transport);
3. Request Orchestration & Throttling
Provider APIs enforce strict rate limits. A sliding window limiter prevents 429 responses during bulk operations.
class SlidingWindowLimiter {
private timestamps: number[] = [];
private readonly maxRequests: number;
private readonly windowMs: number;
constructor(maxRequests: number, windowMs: number) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
}
async acquire(): Promise<void> {
const now = Date.now();
this.timestamps = this.timestamps.filter(t => now - t < this.windowMs);
if (this.timestamps.length >= this.maxRequests) {
const oldest = this.timestamps[0];
const waitTime = this.windowMs - (now - oldest) + 50;
await new Promise(resolve => setTimeout(resolve, waitTime));
return this.acquire();
}
this.timestamps.push(now);
}
}
const apiLimiter = new SlidingWindowLimiter(60, 60_000);
4. Safety Enforcement: Two-Step Confirmation
Destructive operations require explicit human or LLM verification. The first invocation returns a token and impact summary; the second executes.
class ConfirmationGuard {
private tokens = new Map<string, { expires: number; payload: unknown }>();
generate(payload: unknown): string {
const token = crypto.randomUUID();
this.tokens.set(token, { expires: Date.now() + 60_000, payload });
return token;
}
validate(token: string): boolean {
const entry = this.tokens.get(token);
if (!entry || Date.now() > entry.expires) {
this.tokens.delete(token);
return false;
}
this.tokens.delete(token);
return true;
}
}
const safetyGuard = new ConfirmationGuard();
5. Error Normalization
Provider APIs return inconsistent error shapes. A unified adapter ensures the LLM receives predictable failure contexts.
class InfraError extends Error {
constructor(
public readonly kind: 'auth' | 'rate_limit' | 'not_found' | 'validation' | 'unknown',
public readonly code: string,
public readonly message: string,
public readonly raw: unknown
) {
super(message);
this.name = 'InfraError';
}
}
function normalizeProviderResponse(raw: unknown): never {
if (typeof raw === 'object' && raw !== null) {
const obj = raw as Record<string, unknown>;
if ('error' in obj) {
const err = obj.error as Record<string, unknown>;
throw new InfraError('unknown', String(err.code ?? 'ERR_001'), String(err.message ?? 'Unknown error'), raw);
}
if ('errors' in obj && Array.isArray(obj.errors)) {
const first = obj.errors[0] as Record<string, unknown>;
throw new InfraError('validation', String(first.code ?? 'ERR_002'), String(first.description ?? 'Validation failed'), raw);
}
}
throw new InfraError('unknown', 'ERR_PARSE', 'Unexpected response shape', raw);
}
Architecture Rationale
- Zod over raw JSON Schema: Runtime validation catches malformed inputs before they reach the provider API, reducing wasted requests and improving LLM feedback loops.
- Sliding window over token bucket: Sliding windows align better with HTTP rate limit headers and prevent burst accumulation during long-running audits.
- Two-step confirmation: MCP tool annotations (
destructiveHint) are advisory only. Explicit token validation enforces safety regardless of client implementation. - Stdio transport: Eliminates network complexity for local AI assistants. Diagnostic output is strictly routed to stderr to preserve JSON-RPC integrity.
Pitfall Guide
1. Logging to Standard Output
Explanation: The stdio transport multiplexes JSON-RPC messages and arbitrary writes on stdout. Any console.log or unhandled exception stack trace corrupts the protocol stream, causing client disconnections.
Fix: Route all diagnostics to stderr using a structured logger configured with destination: process.stderr. Never write to stdout outside the MCP SDK.
2. Exposing Pagination to the LLM
Explanation: Provider APIs paginate inconsistently (25, 50, or 100 items). If the tool returns a single page, the LLM reasons over incomplete data and misses resources. Fix: Implement transparent auto-pagination within the tool. Fetch all pages sequentially, merge results, and return a single normalized array.
3. Assuming LLMs Respect Destructive Hints
Explanation: MCP annotations like destructiveHint or idempotentHint are metadata. Clients may ignore them, and LLMs will execute destructive calls if context suggests cleanup.
Fix: Implement a two-step confirmation flow. Return a temporary token and impact summary on the first call. Require the token on the second call to execute.
4. JSON Schema Version Drift
Explanation: zod-to-json-schema defaults to Draft 7. Some MCP clients enforce Draft 2020-12 and reject Draft 4 syntax like exclusiveMinimum: true.
Fix: Explicitly target jsonSchema7 or draft2020-12 in the schema generator configuration. Validate output against the strictest client you support.
5. Ignoring API Response Shape Variance
Explanation: Public APIs return {error: {code, message}}, while private manager endpoints return {"errors": [{code, description}]} or raw HTML on auth failure.
Fix: Build a normalization layer that catches all response shapes and throws a unified error class. Include the raw payload for debugging without leaking it to the LLM.
6. Session Decay in Cookie-Based Auth
Explanation: Manager-private APIs often rely on session cookies that expire every few hours. Hardcoded cookies cause silent failures during long-running workflows. Fix: Implement a refresh hook that detects 401/403 responses, triggers a cookie re-extraction routine, and retries the request. Cache tokens in memory only.
7. Over-Fetching Without Idempotency Checks
Explanation: LLMs may retry operations on locked or already-deleted resources if the tool doesn't surface state metadata.
Fix: Include resource state flags (is_locked, status, expires_at) in list responses. Let the LLM filter before calling mutation tools.
Production Bundle
Action Checklist
- Define strict Zod schemas for every tool input and output
- Route all logging to stderr; never pollute stdout
- Implement transparent pagination inside list tools
- Add two-step confirmation guards for destructive operations
- Normalize all provider error shapes into a unified class
- Configure explicit JSON Schema version targeting
- Add sliding window rate limiting aligned with provider caps
- Include resource state metadata in read operations
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Local AI assistant integration | Stdio transport | Zero network overhead, simpler debugging | None |
| Remote or multi-client access | SSE or Streamable HTTP | Enables CORS, authentication, and scaling | Requires reverse proxy |
| Read-only auditing workflows | Bearer token auth | Provider-native, long-lived, auditable | Low |
| Write operations on legacy dashboards | Cookie extraction + refresh hooks | Bypasses undocumented private endpoints | Medium (maintenance) |
| High-frequency bulk operations | Sliding window limiter | Prevents 429s during pagination chains | Slight latency increase |
| Destructive infrastructure changes | Two-step confirmation | Prevents accidental production deletion | None |
Configuration Template
// server.ts
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
import { SlidingWindowLimiter } from './throttle.js';
import { ConfirmationGuard } from './safety.js';
import { InfraError, normalizeProviderResponse } from './errors.js';
const server = new McpServer({ name: 'infra-control-plane', version: '1.0.0' });
const transport = new StdioServerTransport();
const limiter = new SlidingWindowLimiter(60, 60_000);
const guard = new ConfirmationGuard();
server.tool(
'list_hostings',
'Retrieve all hosting instances with pagination handled internally',
{ page: z.number().optional() },
async () => {
await limiter.acquire();
try {
const response = await fetch('https://api.provider.com/v3/hostings', {
headers: { Authorization: `Bearer ${process.env.PROVIDER_TOKEN}` }
});
if (!response.ok) normalizeProviderResponse(await response.json());
const data = await response.json();
return { content: [{ type: 'text', text: JSON.stringify(data) }] };
} catch (err) {
if (err instanceof InfraError) {
return { content: [{ type: 'text', text: `Error: ${err.message}` }], isError: true };
}
throw err;
}
}
);
server.tool(
'deprovision_hosting',
'Permanently remove a hosting instance (requires confirmation)',
{ hostingId: z.number(), reason: z.string(), confirmationToken: z.string().optional() },
async ({ hostingId, reason, confirmationToken }) => {
if (!confirmationToken) {
const token = guard.generate({ hostingId, reason });
return {
content: [{
type: 'text',
text: `Confirmation required. Token: ${token}. Expires in 60s. Hosting ${hostingId} will be deleted.`
}]
};
}
if (!guard.validate(confirmationToken)) {
return { content: [{ type: 'text', text: 'Invalid or expired confirmation token.' }], isError: true };
}
await limiter.acquire();
// Execute deletion...
return { content: [{ type: 'text', text: `Hosting ${hostingId} deprovisioned.` }] };
}
);
await server.connect(transport);
Quick Start Guide
- Initialize a Node 18+ project with ESM support and install
@modelcontextprotocol/sdk,zod, and your HTTP client. - Define your tool schemas using Zod. Compile them to JSON Schema with explicit version targeting.
- Implement the transport layer using
StdioServerTransport. Route all diagnostics to stderr. - Add rate limiting, pagination handling, and confirmation guards before wiring tool handlers.
- Run
node server.tsand connect your MCP client. Verify tool discovery, test a read operation, then validate the confirmation flow on a destructive call.
Mid-Year Sale β Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register β Start Free Trial7-day free trial Β· Cancel anytime Β· 30-day money-back
