Difficulty

Intermediate

Read Time

11 min

How to test MCP servers in TypeScript before they break in production

By Codcompass Team·2026-06-01·11 min read

Engineering Resilient MCP Servers: A Multi-Layer Testing Strategy for TypeScript

Current Situation Analysis

The Model Context Protocol (MCP) has rapidly become the standard interface for connecting LLM clients to external tools, resources, and prompts. The official TypeScript SDK abstracts away much of the protocol overhead, allowing developers to register handlers and spin up a server in minutes. However, this abstraction creates a dangerous illusion: a server that passes local validation frequently collapses under production conditions.

The core pain point is the divergence between demo environments and production workloads. In a local setup, developers typically exercise a single happy path using stdio or a single-instance HTTP server. Real clients, however, introduce network instability, concurrent tool invocations, malformed payloads, and transport renegotiation. The SDK does not prevent these failures; it merely defers them until deployment.

Three architectural realities explain why MCP servers break in production:

Transport Contract Complexity: Since version 1.10.0, the SDK supports Streamable HTTP. This transport exposes a single endpoint that multiplexes POST requests (for RPC-style tool calls) and GET requests (for Server-Sent Events streaming). Tests that only validate stdio or mock HTTP layers completely miss this dual-mode behavior.
Stateful Session Management: StreamableHTTPServerTransport maintains session state in memory by default. When a client reconnects after a network interruption, or when traffic is load-balanced across multiple instances, in-memory session keys become invalid. Without explicit session persistence strategies, state loss is guaranteed.
Schema Drift: Tool registrations declare input schemas and expected output shapes. Manual testing rarely exercises boundary conditions, missing type mismatches, missing required fields, or downstream API contract violations.

The gap is not a flaw in the SDK; it is a consequence of testing at the wrong abstraction layer. Closing it requires a structured, multi-tier testing strategy that validates handlers in isolation, verifies protocol contracts, exercises the full HTTP transport, and simulates production-scale concurrency.

WOW Moment: Key Findings

Testing an MCP server is not a single activity; it is a layered validation pipeline. Each layer catches a distinct class of failure at a different computational cost. The table below contrasts the four primary testing tiers used in production-grade MCP deployments.

Testing Tier	Execution Time	Failure Coverage	Environment Fidelity	CI Resource Cost
Handler Isolation (Unit)	< 50ms	Business logic, downstream stubs, schema validation	None (pure functions)	Negligible
Protocol Contract (InMemory)	< 200ms	RPC routing, tool registration, response shapes	Simulated transport	Low
HTTP Transport (Streamable)	500ms–2s	POST/GET routing, SSE lifecycle, auth middleware, session headers	Full network stack	Medium
Cluster/Concurrency	2s–5s	Session persistence, race conditions, load balancing	Multi-process/external store	High

Why this matters: Relying on a single testing tier leaves blind spots. Unit tests catch logic errors but ignore transport semantics. In-memory contract tests validate protocol compliance but miss network-level failures like SSE stream drops or header parsing. HTTP transport tests expose middleware and routing issues but run slower. Cluster tests validate horizontal scaling but are expensive to run frequently. A tiered approach ensures you catch logic bugs on every commit, protocol violations on pull requests, and transport/scaling failures before merge, optimizing both developer velocity and production reliability.

Core Solution

Building a resilient test suite for an MCP server requires separating concerns: handler logic, protocol routing, transport behavior, and session management. Below is a step-by-step implementation strategy using modern TypeScript tooling.

Step 1: Extract Handler Logic for Unit Testing

The most common anti-pattern is embedding business logic directly inside server.tool() callbacks. This couples your domain code to the SDK, making it impossible to test without spinning up a server. Instead, extract handlers into pure async functions that accept validated input and return MCP-compliant results.

// catalog/handlers.ts
import { z } from "zod";
import { fetchProductFromDB } from "../infrastructure/db";

const ProductQuerySchema = z.object({
  sku: z.string().min(3).max(20),
  includeInventory: z.boolean().default(false),
});

export type ProductQueryInput = z.infer<typeof ProductQuerySchema>;

export async function resolveProductQuery(input: ProductQueryInput) {
  const product = await fetchProductFromDB(input.sku);
  if (!product) {
    return {
      content: [{ type: "text", text: `Product ${input.sku} not found` }],
      isError: true,
    };
  }

  const payload = input.includeInve

ntory ? { ...product, stock: product.inventoryCount } : product;

return { content: [{ type: "text", text: JSON.stringify(payload) }], }; }


By isolating the handler, you can test it with standard unit testing frameworks without network overhead. Mock the database layer, validate schema coercion, and assert on error shapes.

```typescript
// catalog/handlers.test.ts
import { describe, it, expect, vi } from "vitest";
import { resolveProductQuery } from "./handlers";
import * as db from "../infrastructure/db";

vi.mock("../infrastructure/db", () => ({
  fetchProductFromDB: vi.fn(),
}));

describe("resolveProductQuery", () => {
  it("returns product payload when SKU exists", async () => {
    vi.mocked(db.fetchProductFromDB).mockResolvedValue({
      sku: "WIDGET-01",
      name: "Standard Widget",
      price: 29.99,
      inventoryCount: 150,
    });

    const result = await resolveProductQuery({ sku: "WIDGET-01", includeInventory: true });
    expect(result.isError).toBeUndefined();
    const parsed = JSON.parse(result.content[0].text);
    expect(parsed.stock).toBe(150);
  });

  it("returns MCP error shape when SKU is missing", async () => {
    vi.mocked(db.fetchProductFromDB).mockResolvedValue(null);
    const result = await resolveProductQuery({ sku: "GHOST-99" });
    expect(result.isError).toBe(true);
    expect(result.content[0].text).toContain("not found");
  });
});

Architecture Rationale: Separating handlers from transport registration enables parallel test execution, deterministic mocking, and schema validation via libraries like Zod. It also allows you to swap transport layers (stdio, HTTP, WebSocket) without rewriting business logic.

Step 2: Validate Protocol Contracts with InMemoryTransport

Once handlers are verified, you must ensure the server correctly registers tools and responds to MCP protocol requests. The SDK provides InMemoryTransport, which creates a bidirectional channel between a client and server within the same process. This eliminates network latency while preserving protocol semantics.

// tests/contract.test.ts
import { describe, it, expect } from "vitest";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
import { createCatalogServer } from "../server";

describe("MCP Protocol Contract", () => {
  it("exposes registered tools with correct schemas", async () => {
    const server = createCatalogServer();
    const [clientChannel, serverChannel] = InMemoryTransport.createLinkedPair();

    await server.connect(serverChannel);

    const testClient = new Client(
      { name: "contract-tester", version: "0.1.0" },
      { capabilities: {} }
    );
    await testClient.connect(clientChannel);

    const tools = await testClient.listTools();
    const catalogTool = tools.tools.find((t) => t.name === "query-product");

    expect(catalogTool).toBeDefined();
    expect(catalogTool?.inputSchema.required).toContain("sku");
    expect(catalogTool?.inputSchema.properties).toHaveProperty("includeInventory");
  });
});

Architecture Rationale: InMemoryTransport bypasses HTTP parsing, TLS, and socket management, making it ideal for fast feedback loops. It validates that your server correctly serializes tool definitions, handles JSON-RPC routing, and returns structurally valid responses. This layer catches schema drift and registration errors before they reach the network stack.

Step 3: Exercise Streamable HTTP and SSE Lifecycles

The official SDK's Streamable HTTP transport (introduced in v1.10.0) requires testing the actual HTTP server. This layer validates middleware, header parsing, session ID generation, and the dual POST/GET routing contract.

// tests/http-transport.test.ts
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { createServer as createHttpServer } from "node:http";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { createCatalogServer } from "../server";

let httpServer: ReturnType<typeof createHttpServer>;
let endpointUrl: string;

beforeAll(async () => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  });

  const mcpServer = createCatalogServer();
  await mcpServer.connect(transport);

  httpServer = createHttpServer((req, res) => transport.handleRequest(req, res));
  await new Promise<void>((resolve) => httpServer.listen(0, resolve));
  const address = httpServer.address() as { port: number };
  endpointUrl = `http://127.0.0.1:${address.port}/mcp`;
});

afterAll(() => httpServer.close());

describe("Streamable HTTP Transport", () => {
  it("processes POST tool invocation and returns JSON-RPC response", async () => {
    const response = await fetch(endpointUrl, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        jsonrpc: "2.0",
        method: "tools/call",
        params: { name: "query-product", arguments: { sku: "WIDGET-01" } },
        id: 10,
      }),
    });

    expect(response.status).toBe(200);
    const payload = await response.json();
    expect(payload.result).toBeDefined();
    expect(payload.id).toBe(10);
  });

  it("rejects malformed session headers without crashing", async () => {
    const response = await fetch(endpointUrl, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Mcp-Session-Id": "invalid-format-!!!",
      },
      body: JSON.stringify({ jsonrpc: "2.0", method: "ping", params: {}, id: 2 }),
    });

    expect(response.status).toBe(400);
    const error = await response.json();
    expect(error.error.code).toBe(-32600); // Invalid Request
  });
});

Architecture Rationale: Running a real HTTP server on an ephemeral port (listen(0)) ensures tests interact with the actual transport implementation. This catches middleware misconfigurations, CORS issues, and SSE stream initialization failures. Always test both valid and invalid session headers, as the transport enforces strict session validation.

Step 4: Validate Session Persistence and Concurrency

Production deployments rarely run single instances. When scaling horizontally or handling network retries, session state must survive process boundaries. The default in-memory session store will fail under these conditions. Replace it with an external store (Redis, DynamoDB, or PostgreSQL) and test the handoff.

// tests/session-persistence.test.ts
import { describe, it, expect } from "vitest";
import { RedisSessionStore } from "../infrastructure/session";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { createCatalogServer } from "../server";

describe("Cross-Instance Session Resumption", () => {
  it("restores session state when client reconnects to a different instance", async () => {
    const store = new RedisSessionStore({ host: "127.0.0.1", port: 6379 });
    
    // Instance A
    const transportA = new StreamableHTTPServerTransport({
      sessionIdGenerator: () => crypto.randomUUID(),
      sessionStore: store,
    });
    const serverA = createCatalogServer();
    await serverA.connect(transportA);

    // Simulate client interaction on Instance A
    const sessionId = transportA.sessionId;
    await transportA.handleRequest(
      { method: "POST", headers: {} } as any,
      { end: () => {} } as any
    );

    // Instance B receives the same session ID
    const transportB = new StreamableHTTPServerTransport({
      sessionIdGenerator: () => crypto.randomUUID(),
      sessionStore: store,
    });
    const serverB = createCatalogServer();
    await serverB.connect(transportB);

    // Verify session state is retrievable
    const restored = await store.getSession(sessionId);
    expect(restored).not.toBeNull();
    expect(restored?.metadata).toHaveProperty("lastToolCall");
  });
});

Architecture Rationale: External session stores decouple state from process lifecycle. Testing this layer requires spinning up two transport instances sharing the same store. This validates that session IDs are correctly serialized, metadata is preserved, and reconnection logic does not duplicate or lose state.

Pitfall Guide

1. In-Memory Session Leakage

Explanation: Developers assume StreamableHTTPServerTransport handles session persistence automatically. It does not. Default behavior stores sessions in process memory, causing state loss on restart or scale-out. Fix: Inject a SessionStore implementation backed by Redis, PostgreSQL, or a distributed cache. Always test session handoff between instances.

2. Ignoring SSE Stream Lifecycle

Explanation: GET requests for SSE streams require proper header handling (Content-Type: text/event-stream, Cache-Control: no-cache, Connection: keep-alive). Tests that only send POST requests miss stream initialization failures. Fix: Add explicit GET tests that verify stream headers, heartbeat intervals, and graceful closure on client disconnect.

3. Hardcoding Transport Assumptions

Explanation: Writing tests that assume stdio or HTTP exclusively breaks when clients switch transports. The SDK supports multiple transports, and handlers should remain transport-agnostic. Fix: Abstract transport initialization behind a factory. Test handlers independently of transport, and validate transport behavior in dedicated integration suites.

4. Skipping Strict Schema Validation

Explanation: MCP clients may send partial payloads, extra fields, or type mismatches. Without strict validation, handlers receive malformed data, causing runtime crashes or silent data corruption. Fix: Use Zod or Joi to validate inputs before handler execution. Assert that invalid payloads return MCP-compliant error responses (isError: true, correct JSON-RPC error codes).

Explanation: Tool handlers often mutate shared state or make non-idempotent downstream calls. Concurrent requests can cause race conditions, duplicate writes, or corrupted session metadata. Fix: Test handlers under simulated concurrency using Promise.all() or load-testing utilities. Ensure downstream calls are idempotent or protected by distributed locks.

6. Mocking the SDK Instead of the Handler

Explanation: Developers sometimes mock server.tool() or InMemoryTransport to avoid setup complexity. This tests the mock, not the actual routing logic. Fix: Mock only external dependencies (databases, APIs, caches). Let the SDK handle routing and serialization. Assert on real protocol responses.

7. CI Test Flakiness from Port Collisions

Explanation: Running multiple HTTP transport tests in parallel can cause EADDRINUSE errors if ports are hardcoded or not properly released. Fix: Always use listen(0) to request an ephemeral port. Ensure afterAll hooks properly close servers and drain connections. Use test runners with built-in isolation (Vitest/Jest --runInBand for transport tests if needed).

Production Bundle

Action Checklist

Extract all tool/resource handlers into pure async functions with explicit input/output types
Implement Zod or equivalent schema validation before handler execution
Write unit tests for handlers with mocked downstream dependencies
Add contract tests using InMemoryTransport to verify tool registration and response shapes
Create HTTP transport tests using ephemeral ports to validate POST/GET routing and SSE headers
Replace in-memory session storage with an external store for multi-instance deployments
Test session resumption across separate transport instances
Configure CI to run unit/contract tests on every commit and HTTP/cluster tests on pull requests

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local development & rapid iteration	Handler unit tests + InMemory contract tests	Fast feedback, no network overhead, catches logic/schema errors early	Minimal CI minutes
Pre-merge validation	Full HTTP transport tests + SSE lifecycle checks	Validates real network stack, middleware, and session headers before deployment	Moderate CI cost, runs once per PR
Horizontal scaling / multi-region	External session store + cross-instance resumption tests	Ensures state survives process boundaries and load balancer routing	Higher infrastructure cost, requires Redis/DB
High-concurrency AI agent workloads	Concurrency simulation + idempotency guards	Prevents race conditions, duplicate tool calls, and corrupted session state	Requires distributed locking or queue architecture

Configuration Template

// vitest.config.ts
import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    include: ["src/**/*.test.ts", "tests/**/*.test.ts"],
    coverage: {
      provider: "v8",
      reporter: ["text", "lcov"],
      exclude: ["src/infrastructure/mocks/**", "tests/fixtures/**"],
    },
    poolOptions: {
      threads: {
        singleThread: true, // Required for HTTP transport tests to avoid port collisions
      },
    },
  },
});

// tests/harness.ts
import { createServer as createHttpServer } from "node:http";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { RedisSessionStore } from "../src/infrastructure/session";

export async function createTestHarness(serverFactory: () => any) {
  const sessionStore = new RedisSessionStore({ host: "127.0.0.1", port: 6379 });
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
    sessionStore,
  });

  const server = serverFactory();
  await server.connect(transport);

  const httpServer = createHttpServer((req, res) => transport.handleRequest(req, res));
  await new Promise<void>((resolve) => httpServer.listen(0, resolve));
  const addr = httpServer.address() as { port: number };

  return {
    baseUrl: `http://127.0.0.1:${addr.port}/mcp`,
    sessionId: transport.sessionId,
    sessionStore,
    teardown: async () => {
      httpServer.close();
      await sessionStore.disconnect();
    },
  };
}

Quick Start Guide

Initialize test runner: Install vitest or jest, configure vitest.config.ts with single-threaded pool for transport tests, and set up coverage reporting.
Extract handlers: Move all server.tool() callbacks into separate modules. Wrap inputs with Zod schemas and return MCP-compliant result objects.
Add contract tests: Use InMemoryTransport.createLinkedPair() to connect a test client to your server. Assert on listTools() output and individual tool invocation shapes.
Spin up HTTP harness: Use the provided createTestHarness template to start an ephemeral HTTP server. Write POST and GET tests that validate routing, headers, and error handling.
Wire CI pipeline: Configure your CI to run unit and contract tests on every push. Schedule HTTP and session persistence tests to run on pull requests. Merge only when all tiers pass.

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

7-day free trial · Cancel anytime · 30-day money-back