Building Scalable MCP Servers with Spring AI 2.0: Annotation-Driven Architecture

Current Situation Analysis

The Model Context Protocol (MCP) has rapidly become the standard interface for connecting AI agents to external data and execution environments. While Python and TypeScript ecosystems adopted MCP server patterns early, Java developers faced a steep integration curve. Building an MCP server in Java traditionally required manual protocol wiring: constructing ToolCallback descriptors, hand-crafting JSON Schema definitions, managing transport lifecycles, and explicitly registering every endpoint with the runtime. This boilerplate consumed development cycles that should have been spent on domain logic.

The problem is often misunderstood as a framework limitation rather than an architectural mismatch. MCP expects a clean, schema-driven contract between client and server. When developers manually serialize parameters, manage transport state, and inject progress channels, they introduce fragility. Schema drift, transport timeouts, and context leakage become routine production issues. The mental model shifts from "implementing business capabilities" to "maintaining protocol compliance."

Spring AI 2.0.0-M6 (released May 2026) addresses this friction by introducing a native annotation layer. The framework now exposes @McpTool, @McpResource, @McpPrompt, and @McpComplete as first-class constructs. Spring Boot's auto-configuration scans these annotations, generates compliant JSON schemas from method signatures, and wires transport layers automatically. Framework-specific parameters like McpSyncRequestContext and McpAsyncRequestContext are recognized internally and stripped from client-facing schemas. This shifts the development paradigm from imperative protocol management to declarative capability exposure.

WOW Moment: Key Findings

The transition from callback-based wiring to annotation-driven registration fundamentally changes how MCP servers are architected in Java. The following comparison highlights the operational impact:

Approach	Schema Generation	Transport Wiring	Progress Integration	Code Footprint
Legacy `ToolCallback` API	Manual JSON construction via `JsonSchemaObject`	Explicit bean registration & transport config	Custom event emitter wiring	High (~120-180 lines per capability)
Spring AI 2.0 Annotations	Automatic from method signatures & `@McpToolParam`	Auto-configured via starters	Context injection (framework-hidden)	Low (~10-20 lines per capability)

This reduction in boilerplate is not merely cosmetic. Automatic schema generation eliminates serialization mismatches that cause client-side validation failures. Context injection ensures progress and logging channels remain invisible to the MCP contract, preventing schema pollution. The annotation layer also enforces consistent error boundaries, making it easier to standardize how tools report failures to AI agents. Teams can now iterate on business logic without rewriting protocol adapters for every new endpoint.

Core Solution

Building a production-ready MCP server with Spring AI 2.0 requires aligning domain capabilities with the annotation model while making deliberate transport and serialization choices. The following implementation demonstrates a warehouse inventory system exposing tools, resources, and prompts.

Step 1: Project Initialization

Spring AI 2.0 requires Java 21 and Spring Boot 3.5+. Milestone artifacts reside in Spring's milestone repository, which must be explicitly declared. For synchronous I/O workloads (typical for relational database interactions), the WebMVC starter provides the most straightforward deployment path.

<repositories>
    <repository>
        <id>spring-milestones</id>
        <url>https://repo.spring.io/milestone</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
        <version>2.0.0-M6</version>
    </dependency>
</dependencies>

Step 2: Exposing Tools with Context-Aware Progress

Tools execute actions. In production, long-running operations require progress feedback to prevent client timeouts. Spring AI injects McpSyncRequestContext automatically, keeping it out of the JSON schema while exposing logging and progress channels.

package com.example.warehouse.mcp;

import org.springframework.ai.mcp.server.annotation.McpTool;
import org.springframework.ai.mcp.server.annotation.McpToolParam;
import org.springframework.ai.mcp.server.context.McpSyncRequestContext;
import org.springframework.stereotype.Component;

@Component
public class InventoryOperations {

    private final InventoryRepository inventoryRepo;

    public InventoryOperations(InventoryRepository inventoryRepo) {
        this.inventoryRepo = inventoryRepo;
    }

    @McpTool(
        name = "audit_stock_levels",
        description = "Perform a full warehouse stock audit. Returns aggregated metrics per aisle."
    )
    public AuditReport auditStock(
        McpSyncRequestContext ctx,
        @McpToolParam(description = "Target warehouse identifier", required = true) String warehouseId
    ) {
        ctx.logging().info("Initiating stock audit for warehouse: {}", warehouseId);
        ctx.progress().report(0.1, "Fetching inventory manifests");

        var manifests = inventoryRepo.fetchManifests(warehouseId);
        ctx.progress().report(0.4, "Validating SKU counts");

        var validated = inventoryRepo.validateQuantities(manifests);
        ctx.progress().report(0.8, "Computing discrepancy metrics");

        return inventoryRepo.generateAuditReport(validated);
    }
}

Architecture Rationale:

Record-based DTOs: AuditReport should be a Java record. Records provide immutable state, predictable Jackson serialization, and implicit schema mapping. This eliminates the ambiguity of Map<String, Object> returns while avoiding verbose getter/setter chains.
Context Injection: McpSyncRequestContext is recognized by the framework and excluded from the MCP JSON schema. This keeps the client contract clean while providing access to logging(), progress(), sampling(), and elicitation() channels.
Progress vs Streaming: Progress reporting is metadata for UI/agent feedback. It does not stream incremental results. The method still returns a single payload upon completion. For token-by-token LLM streaming, use sampling endpoints instead.

Step 3: Exposing Resources and Prompts

Resources expose readable state via URI templates. Prompts provide pre-structured text templates that clients can parameterize.

package com.example.warehouse.mcp;

import org.springframework.ai.mcp.server.annotation.McpResource;
import org.springframework.ai.mcp.server.annotation.McpPrompt;
import org.springframework.stereotype.Component;

@Component
public class WarehouseSurface {

    @McpResource(
        uri = "warehouse://{warehouseId}/layout",
        description = "Returns the physical layout configuration for a specific warehouse."
    )
    public LayoutConfig getLayout(String warehouseId) {
        return LayoutConfig.fromId(warehouseId);
    }

    @McpPrompt(
        name = "reorder_analysis",
        description = "Generates a prompt template for analyzing low-stock reorder thresholds."
    )
    public String generateReorderPrompt(
        @McpToolParam(description = "SKU code to analyze", required = true) String skuCode
    ) {
        return """
            Analyze current stock levels for SKU %s.
            Evaluate historical demand velocity, supplier lead times, and safety stock policies.
            Recommend a reorder quantity and trigger threshold.
            """.formatted(skuCode);
    }
}

Architecture Rationale:

URI Template Matching: The {warehouseId} placeholder is resolved automatically. Method parameter names must align with template variables, or Spring AI will fail to bind the path segment.
Prompt Rendering: Prompts return plain strings. The client receives the rendered template and injects it into its model context. This decouples prompt engineering from tool execution.

Pitfall Guide

1. Context Parameter Schema Leakage

Explanation: Developers sometimes place McpSyncRequestContext or McpAsyncRequestContext after user-facing parameters, or attempt to serialize them manually. While Spring AI typically filters framework types, inconsistent ordering or custom serializers can cause context objects to leak into the JSON schema, breaking client validation. Fix: Always declare context parameters first in the method signature. Verify the generated schema using /mcp/tools/list before deployment. Never annotate context parameters with @McpToolParam.

2. Blocking I/O in Reactive Contexts

Explanation: Using McpAsyncRequestContext with blocking database calls or synchronous HTTP clients defeats the purpose of reactive transport. This causes thread pool exhaustion under concurrent tool invocations. Fix: Pair McpAsyncRequestContext with reactive repositories (ReactiveCrudRepository) or offload blocking work to Schedulers.boundedElastic(). Ensure the return type is Mono<T> or Flux<T>.

3. Misinterpreting Progress as Result Streaming

Explanation: Teams expect ctx.progress().report() to stream partial JSON payloads or token chunks. Progress is strictly metadata for agent UI feedback. The actual tool result remains a single serialized object. Fix: Use progress for long-running batch operations. For incremental LLM output, implement sampling endpoints or switch to a streaming-capable transport with explicit token chunking.

4. Stateful Transport in Serverless Environments

Explanation: Deploying streamable-http to stateless platforms (Cloud Run, Vercel, AWS Lambda) without session management causes request routing failures. The protocol expects session affinity for bidirectional notifications. Fix: Use stateless-streamable-http for serverless deployments. Pass session tokens explicitly in request headers, or implement a lightweight session registry backed by Redis if stateful behavior is mandatory.

5. URI Template Variable Mismatch

Explanation: @McpResource fails silently or throws binding errors when path variables do not match method parameters. Typos in {variable} syntax or missing @PathVariable equivalents break resource resolution. Fix: Ensure exact string matching between URI template placeholders and method parameter names. Use IDE refactoring tools to rename parameters safely. Validate resource listing endpoints before client integration.

6. Unhandled Exception Propagation

Explanation: Throwing unchecked exceptions (NullPointerException, IllegalArgumentException) without MCP-aware wrapping corrupts the JSON-RPC response. Clients receive malformed errors instead of structured failure payloads. Fix: Catch domain exceptions and wrap them in McpError or return a standardized error DTO. Implement a global @ControllerAdvice that translates exceptions into MCP-compliant error responses with clear, agent-readable messages.

7. Missing Milestone Repository Configuration

Explanation: Spring AI 2.0 milestones are not published to Maven Central. Omitting the milestone repository causes dependency resolution failures during build. Fix: Explicitly declare https://repo.spring.io/milestone in pom.xml or build.gradle. Pin the exact milestone version (2.0.0-M6) to avoid unexpected snapshot breaks.

Production Bundle

Action Checklist

Verify Java 21 and Spring Boot 3.5+ runtime compatibility before dependency resolution
Declare Spring milestone repository explicitly in build configuration
Replace manual ToolCallback registrations with @McpTool annotations
Use Java records for all tool return types to ensure predictable schema generation
Inject McpSyncRequestContext or McpAsyncRequestContext as the first method parameter
Validate generated JSON schemas via /mcp/tools/list before client integration
Configure transport strategy based on deployment topology (stateful vs stateless)
Implement global exception translation to MCP-compliant error responses

Decision Matrix

Scenario	Recommended Approach	Why	Cost Impact
Local CLI agent tooling	`stdio` transport	Zero network overhead, process isolation, ideal for single-user development	None (local only)
Internal microservice API	`streamable-http` (WebMVC)	Session affinity, bidirectional notifications, reverse-proxy compatible	Moderate (requires sticky sessions or session store)
Serverless / autoscaled deployment	`stateless-streamable-http`	No session affinity required, horizontal scaling friendly, request-driven	Low (pay-per-invocation, no session infrastructure)
High-throughput reactive workloads	`streamable-http` (WebFlux)	Non-blocking I/O, backpressure support, efficient thread utilization	Higher (requires reactive ecosystem alignment)

Configuration Template

spring:
  ai:
    mcp:
      server:
        transport: streamable-http
        path: /api/mcp
        name: warehouse-inventory-server
        version: 1.2.0
        stateless: false
      logging:
        level: INFO
      jackson:
        serialization:
          write-dates-as-timestamps: false
          fail-on-empty-beans: false
        deserialization:
          fail-on-unknown-properties: true

Quick Start Guide

Initialize Project: Create a Spring Boot 3.5+ project with Java 21. Add the spring-ai-starter-mcp-server-webmvc dependency and declare the Spring milestone repository.
Define Capabilities: Annotate service methods with @McpTool, @McpResource, or @McpPrompt. Inject context parameters where progress or logging is required. Return Java records for predictable serialization.
Configure Transport: Set spring.ai.mcp.server.transport to streamable-http for local testing. Adjust path, name, and version to match your deployment conventions.
Validate Endpoints: Start the application and navigate to http://localhost:8080/api/mcp/tools/list. Verify that schemas match method signatures and that context parameters are excluded.
Connect Client: Configure your MCP client (Claude Code, custom agent, or test harness) to point to the server endpoint. Execute tool calls and monitor progress channels in the client UI.

Spring AI 2.0 MCP Annotations: From Tool to Production