ep 1: Endpoint Registry and Transport Configuration
Instead of scattering URLs across configuration files, centralize service endpoints in a dedicated registry. This approach simplifies environment switching and enables consistent transport settings across all MCP connections.
@Configuration
public class AgentEndpointRegistry {
@Value("${ai.mcp.services.order}")
private String orderEndpoint;
@Value("${ai.mcp.services.payment}")
private String paymentEndpoint;
@Value("${ai.mcp.services.inventory}")
private String inventoryEndpoint;
@Value("${ai.mcp.services.validation}")
private String validationEndpoint;
public List<String> resolveActiveEndpoints() {
return List.of(orderEndpoint, paymentEndpoint, inventoryEndpoint, validationEndpoint);
}
public HttpMcpTransport buildTransport(String targetUrl) {
return new HttpMcpTransport.Builder()
.sseUrl(targetUrl)
.logRequests(true)
.logResponses(true)
.connectTimeout(Duration.ofSeconds(5))
.readTimeout(Duration.ofSeconds(10))
.build();
}
}
Architecture Rationale:
- Centralizing endpoints prevents configuration drift across environments.
- Explicit timeout settings prevent indefinite blocking when a downstream service degrades.
- Request/response logging is enabled by default for observability, but should be toggled off in high-throughput production environments to reduce I/O overhead.
LangChain4j's McpToolProvider acts as a runtime bridge between the LLM and distributed services. It queries each MCP server during initialization, aggregates tool schemas, and exposes them as a unified capability set.
@Configuration
@RequiredArgsConstructor
public class AgentToolingConfiguration {
private final AgentEndpointRegistry endpointRegistry;
private final ChatModel primaryLanguageModel;
@Bean
public McpToolProvider distributedToolResolver() {
List<McpClient> serviceClients = endpointRegistry.resolveActiveEndpoints()
.stream()
.map(url -> new DefaultMcpClient.Builder()
.transport(endpointRegistry.buildTransport(url))
.build())
.toList();
return McpToolProvider.builder()
.mcpClients(serviceClients)
.build();
}
@Bean
public DynamicAgentInterface orchestrationAgent(McpToolProvider toolResolver) {
return AiServices.builder(DynamicAgentInterface.class)
.chatModel(primaryLanguageModel)
.toolProvider(toolResolver)
.maxSequentialToolsInvocations(6)
.build();
}
}
Architecture Rationale:
toolProvider replaces static .tools() registration. The framework handles schema serialization and runtime dispatch automatically.
maxSequentialToolsInvocations is a critical safety valve. It caps the number of tool calls the LLM can chain in a single reasoning turn. Without it, ambiguous prompts can trigger infinite invocation loops.
- The provider abstracts service ownership. The agent never sees URLs or network topology; it only interacts with function signatures.
Step 3: Runtime Execution Flow
When a user submits a query, the execution pipeline follows a deterministic cycle:
- Schema Injection: LangChain4j calls
tools/list on each registered MCP server during agent initialization. Schemas are serialized into functionDeclaration objects and injected into the system prompt.
- Reasoning & Selection: The LLM parses the user query against available tool descriptions and generates a
functionCall payload with resolved arguments.
- Framework Interception: LangChain4j intercepts the call, routes it to the correct MCP client, and executes an HTTP POST to the
/mcp/message endpoint.
- Service Execution: The target service processes the request, runs the underlying business logic, and returns a structured result.
- Response Mapping: The framework wraps the result in a
functionResponse envelope and feeds it back to the LLM for final answer generation.
This cycle repeats until the LLM determines the query is satisfied or hits the sequential invocation limit. Independent calls within a chain can execute concurrently when virtual threads are enabled, significantly reducing total wall-clock time.
Pitfall Guide
Explanation: LLMs will aggressively chain tools when given open-ended prompts. Without explicit limits, a single query can trigger dozens of sequential calls, exhausting rate limits and inflating latency.
Fix: Always configure maxSequentialToolsInvocations. Set conservative limits (3-6) based on expected workflow complexity. Monitor invocation counts in production and adjust dynamically if needed.
2. Synchronous HTTP Blocking
Explanation: Each MCP tool call is an HTTP request. In a traditional thread pool, sequential chains block worker threads, causing thread starvation under concurrent load.
Fix: Enable Spring Boot virtual threads. They allow the framework to park waiting I/O operations without consuming OS threads, enabling true parallel execution for independent calls.
Explanation: The LLM relies entirely on parameter names and descriptions to resolve arguments. Vague naming (getData, processRequest) or missing type hints causes misrouted calls or invalid payloads.
Fix: Enforce strict naming conventions (getStockByProduct, calculateFraudRisk). Include parameter constraints in descriptions (e.g., threshold: integer (1-100)). Validate schemas against OpenAPI-style strictness before deployment.
4. Silent Service Degradation
Explanation: When an MCP server returns HTTP 5xx or times out, the framework propagates the error as a function result. The LLM may retry indefinitely or hallucinate a fallback response.
Fix: Implement circuit breaker patterns at the transport layer. Configure fallback prompts that instruct the agent to skip unavailable tools gracefully. Log degradation events separately from normal tool calls.
Explanation: Developers often register local @Tool methods alongside MCP providers in the same agent. This creates inconsistent execution paths and complicates testing.
Fix: Establish a strict architectural boundary. Use @Tool exclusively for same-JVM operations (e.g., sub-agent delegation, local data transformations). Reserve MCP providers for cross-service or external API interactions.
Explanation: Downstream services often enforce strict rate limits. An agent chaining multiple tools can easily exceed thresholds, triggering 429 responses that degrade user experience.
Fix: Implement client-side throttling or request queuing. Use Resilience4j or Spring Cloud Circuit Breaker to enforce concurrency limits. Add exponential backoff for retryable HTTP status codes.
Explanation: Registering all available tools for every agent increases prompt token consumption and confuses the LLM with irrelevant capabilities.
Fix: Implement tool namespace filtering. Create agent-specific provider instances that only expose capabilities relevant to their domain. Use metadata tags in MCP server definitions to enable selective discovery.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single JVM, low latency, static tools | @Tool annotation | Zero network overhead, compile-time validation | Lowest compute cost |
| Cross-service, dynamic capabilities, multi-tenant | Dynamic MCP Provider | Decouples agent from service topology, enables runtime discovery | Moderate network cost, higher flexibility |
| High-throughput, strict rate limits, predictable workflows | Direct REST/gRPC clients | Bypasses LLM reasoning overhead, enables precise concurrency control | Lowest latency, highest development cost |
| Mixed environment (local + remote) | Hybrid provider + namespace filtering | Balances performance with distributed capability access | Optimized token usage, moderate infra cost |
Configuration Template
# application.yml
ai:
mcp:
services:
order: ${MCP_ORDER_URL:http://localhost:3000/sse}
payment: ${MCP_PAYMENT_URL:http://localhost:8091/sse}
inventory: ${MCP_INVENTORY_URL:http://localhost:8092/sse}
validation: ${MCP_VALIDATION_URL:http://localhost:8090/sse}
transport:
connect-timeout: 5s
read-timeout: 10s
log-requests: false
log-responses: false
agent:
max-sequential-invocations: 5
virtual-threads: true
spring:
threads:
virtual:
enabled: true
ai:
chat:
model:
provider: openai
options:
model: gpt-4o
temperature: 0.2
// Production-ready agent builder with observability hooks
@Bean
public ProductionAgentInterface secureAgent(
McpToolProvider toolResolver,
ChatModel languageModel,
ToolInvocationMetrics metricsCollector
) {
return AiServices.builder(ProductionAgentInterface.class)
.chatModel(languageModel)
.toolProvider(toolResolver)
.maxSequentialToolsInvocations(5)
.toolExecutionListener((call, result, duration) -> {
metricsCollector.recordInvocation(call.name(), duration, result.success());
})
.build();
}
Quick Start Guide
- Initialize MCP Servers: Deploy your backend services with MCP-compatible endpoints. Verify each exposes a
/sse stream and responds to tools/list requests.
- Configure Endpoints: Add service URLs to
application.yml using environment variables for production and localhost defaults for development.
- Register Tool Provider: Create a
@Configuration class that builds McpToolProvider with your endpoint list and attaches it to AiServices.builder().
- Enable Virtual Threads: Add
spring.threads.virtual.enabled=true to your configuration to unlock concurrent tool execution.
- Test Chain Execution: Submit a multi-step query (e.g., "Check stock for product X and calculate fraud risk for order Y"). Verify tool calls route correctly and complete within expected latency thresholds.
The dynamic MCP provider model transforms agents from static script runners into adaptive system operators. By enforcing execution boundaries, optimizing concurrency, and maintaining strict schema discipline, you can deploy distributed AI workflows that scale predictably under production load.