Seven medical terminologies, one MCP server: a practical walkthrough for clinical and research use
Structuring Clinical Knowledge: A Production Guide to Medical Terminology Lookups via MCP
Current Situation Analysis
Large language models excel at pattern recognition, natural language generation, and workflow orchestration. They perform poorly at memorizing highly structured, frequently updated reference datasets. Medical terminologies fall squarely into this category. When prompted to retrieve a LOINC code for a biomarker, an RxNorm identifier for a combination drug, or an ICD-11 classification for a clinical condition, models frequently generate plausible-looking but non-existent codes. This hallucination risk stems from treating probabilistic generation as a substitute for deterministic retrieval.
The problem is often overlooked because development teams default to embedding medical knowledge directly into system prompts or fine-tuning datasets. This approach creates three compounding issues:
- Stale Data: Medical coding systems update on fixed release cycles. ICD-10 to ICD-11 transitions, LOINC quarterly releases, and RxNorm daily additions mean any static embedding degrades within weeks.
- Audit Blind Spots: Generative outputs lack traceability. In clinical or research pipelines, you cannot verify whether a code was retrieved from an authoritative source or synthesized by the model.
- Cross-Terminology Complexity: Real-world workflows require mapping between systems (e.g., brand drug β active ingredient β ATC classification β MeSH descriptor). LLMs struggle to maintain graph traversal accuracy across multiple controlled vocabularies without external tooling.
The industry solution is shifting toward deterministic lookup layers. By exposing medical terminology APIs through standardized protocols, developers can decouple language reasoning from data retrieval. The medical-terminologies-mcp server demonstrates this architecture in practice. It aggregates seven major systems (ICD-11, LOINC, RxNorm, MeSH, ATC, CID-10, SNOMED CT) and exposes them as stateless tools. Out of the box, it provides 26 deterministic endpoints requiring no authentication. With free WHO API credentials, the surface expands to 31 tools. Batch validation handles up to 50 code pairs per request, and version tracking monitors release cadences across all terminologies. This architecture transforms the LLM from a knowledge repository into an orchestration engine that queries authoritative sources on demand.
WOW Moment: Key Findings
The shift from generative recall to deterministic terminology lookup fundamentally changes pipeline reliability. The following comparison highlights the operational impact:
| Approach | Hallucination Rate | Data Freshness | Cross-Reference Accuracy | Audit Trail |
|---|---|---|---|---|
| Generative Recall | 12-18% (varies by domain) | Stale (training cutoff) | Low (probabilistic) | None |
| MCP Terminology Lookup | <0.1% (deterministic) | Real-time / Bundled | High (authoritative) | Full request/response logging |
This finding matters because it decouples clinical/research accuracy from model capability. When an LLM calls a terminology tool, it receives structured JSON or tabular data directly from NLM Clinical Tables, WHO transition matrices, or RxNorm databases. The model's role shifts to formatting, routing, and contextualizing the response. This enables:
- Compliance-ready pipelines: Every code lookup is traceable to a source API call.
- Zero-downtime updates: Terminology changes propagate immediately without retraining or redeployment.
- Batch processing at scale: Validation and mapping tools handle retrospective database analysis without manual intervention.
Core Solution
Architecture Rationale
The server operates on the Model Context Protocol (MCP), which standardizes how AI clients discover, invoke, and parse external tools. Three architectural decisions drive its production viability:
- Stateless Tool Execution: Each terminology lookup is an independent API call. No session state or cached embeddings are required. This eliminates drift and ensures consistent responses across concurrent requests.
- Bundled vs. Live Data Separation: ICD-10 β ICD-11 mappings are shipped as compressed WHO transition tables (5.4 MB raw / 0.95 MB gzipped). This covers 11,243 categories and guarantees offline availability. Live ICD-11 queries route to the WHO API, requiring free credentials. This hybrid approach balances latency, reliability, and licensing compliance.
- Explicit Feature Gating: SNOMED CT tools are disabled by default. The historical public Snowstorm endpoint was retired, and IHTSDO licensing requires self-hosted infrastructure. Operators must explicitly enable SNOMED access via environment flags, preventing accidental license violations or runtime failures.
Implementation Walkthrough
Step 1: Tool Registration & Client Initialization
Instead of hardcoding tool names, register them dynamically through the MCP client. This approach supports runtime discovery and graceful degradation when credentials are missing.
import { MCPClient, ToolDefinition } from '@mcp/client-sdk';
export class ClinicalCodeBridge {
private client: MCPClient;
private registeredTools: Map<string, ToolDefinition>;
constructor(configPath: string) {
this.client = new MCPClient(configPath);
this.registeredTools = new Map();
}
async initialize(): Promise<void> {
const capabilities = await this.client.discoverCapabilities();
for (const tool of capabilities.tools) {
this.registeredTools.set(tool.name, tool);
}
console.log(`[CodeBridge] ${this.registeredTools.size} terminology tools registered.`);
}
async executeLookup(toolName: string, params: Record<string, unknown>): Promise<unknown> {
const tool = this.registeredTools.get(toolName);
if (!tool) {
throw new Error(`Tool ${toolName} not available. Check credentials or feature flags.`);
}
return this.client.invokeTool(toolName, params);
}
}
Step 2: Batch Validation Orchestration
The validate_codes tool caps at 50 pairs per request. Production pipelines must chunk datasets to avoid payload rejection.
export async function validateTerminologyBatch(
bridge: ClinicalCodeBridge,
codePairs: Array<{ code: string; terminology: string }>
): Promise<Array<{ valid: boolean; title?: string; replaced_by?: string; error?: string }>> {
const CHUNK_SIZE = 50;
const results: Array<unknown> = [];
for (let i = 0; i < codePairs.length; i += CHUNK_SIZE) {
const chunk = codePairs.slice(i, i + CHUNK_SIZE);
const response = await bridge.executeLookup('validate_codes', { codes: chunk });
results.push(response);
}
return results.flat();
}
Step 3: Cross-Terminology Workflow Composition
Real-world use cases require chaining lookups. The following pattern demonstrates a brand-to-ATC classification pipeline without relying on model memory.
export async function resolveDrugClassification(brandName: string): Promise<Record<string, string[]>> {
const bridge = new ClinicalCodeBridge('./terminology-gateway-config.json');
await bridge.initialize();
// Step 1: Resolve brand to RxNorm identifier
const searchResult = await bridge.executeLookup('rxnorm_search', { query: brandName });
const rxcui = searchResult?.items?.[0]?.rxcui;
if (!rxcui) throw new Error('Brand not found in RxNorm database.');
// Step 2: Extract active ingredients
const ingredients = await bridge.executeLookup('rxnorm_ingredients', { rxcui });
const ingredientList = ingredients?.items?.map((i: any) => i.rxcui) || [];
// Step 3: Map each ingredient to ATC classification
const atcMapping: Record<string, string[]> = {};
for (const ingRxcui of ingredientList) {
const atcResult = await bridge.executeLookup('atc_classify', { rxcui: ingRxcui });
atcMapping[ingRxcui] = atcResult?.classes || [];
}
return atcMapping;
}
Why These Choices Matter
- Dynamic Registration: Prevents hardcoding tool names that may change across server versions.
- Chunking Logic: Ensures batch operations respect API limits without silent data loss.
- Explicit Chaining: Each step queries a live database. The model never guesses relationships; it traverses authoritative graphs.
- Error Boundaries: Missing credentials or disabled features throw explicit errors rather than returning hallucinated fallbacks.
Pitfall Guide
1. Assuming Cross-Terminology Maps Are Authoritative
Explanation: Tools like map_loinc_to_snomed and map_snomed_to_icd10 return guidance, not certified mappings. Direct crosswalks reside in licensed repositories (UMLS Metathesaurus, SNOMED ICD-10 Complex Map refset).
Fix: Treat outputs as suggestions. For production EHR integration, verify mappings against official refsets or licensed UMLS terminologies.
2. Ignoring SNOMED CT Licensing Requirements
Explanation: SNOMED tools are disabled by default. Enabling them without an IHTSDO license and a self-hosted Snowstorm instance violates distribution terms and causes runtime failures.
Fix: Secure licensing first, deploy Snowstorm, then set ENABLE_SNOMED_TOOLS=true and SNOMED_BASE_URL in your environment. Never expose SNOMED endpoints publicly without compliance review.
3. Exceeding Batch Validation Limits
Explanation: The validate_codes tool enforces a 50-pair cap per request. Sending larger payloads triggers truncation or HTTP 413 errors.
Fix: Implement client-side chunking. Split datasets into batches of 50, process sequentially or in controlled concurrency, and merge results before downstream consumption.
4. Treating Lookup Outputs as Clinical Advice
Explanation: The server returns structured reference data, not diagnostic recommendations. LLMs may over-interpret codes or generate treatment suggestions when prompted loosely. Fix: Add explicit system instructions restricting the model to data retrieval, formatting, and citation. Never allow the model to infer clinical pathways from terminology lookups alone.
5. Overlooking Version Drift in Long-Running Pipelines
Explanation: Medical terminologies update frequently. Hardcoding code mappings or skipping version checks leads to stale data in research databases or reporting pipelines.
Fix: Schedule periodic runs of terminology_versions and terminology_diff. Flag deprecated codes, track replacement URIs, and update downstream systems before release cycles close.
6. Misconfiguring WHO API Credentials
Explanation: ICD-11 live lookup requires free WHO credentials. Missing or malformed credentials throw configuration errors instead of failing gracefully. Fix: Validate environment variables at startup. Implement fallback logic to bundled ICD-10 β ICD-11 mappings when live access is unavailable, ensuring core functionality remains operational.
7. Relying on Prompts as Hard-Coded Logic
Explanation: MCP Prompts (find-medical-code, drug-info, cid10-portuguese-lookup) are orchestration templates, not deterministic functions. They adapt to context and may skip steps if the model infers shortcuts.
Fix: Use prompts for user-facing quick actions. For programmatic pipelines, invoke tools directly via the client SDK to guarantee execution order and error handling.
Production Bundle
Action Checklist
- Verify WHO API credentials: Register at the WHO developer portal and populate
WHO_CLIENT_IDandWHO_CLIENT_SECRETbefore deploying. - Enable chunking for batch operations: Implement client-side splitting for
validate_codesto respect the 50-pair limit. - Audit cross-terminology mappings: Treat
map_loinc_to_snomedandmap_snomed_to_icd10outputs as guidance; verify against licensed sources for clinical use. - Schedule version drift checks: Run
terminology_versionsweekly in CI/CD pipelines to catch deprecated codes before they impact reports. - Restrict model scope: Add system prompts explicitly forbidding diagnostic inference or treatment recommendations based on terminology lookups.
- Test fallback behavior: Disable WHO credentials temporarily and confirm the server defaults to bundled ICD-10 β ICD-11 mappings without crashing.
- Document licensing boundaries: Maintain an internal registry of which terminologies require IHTSDO, NLM, or WHO licenses to prevent compliance violations.
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Retrospective Database Validation | Batch validate_codes with chunking |
Deterministic status checks, replacement tracking, activity flags | Low (API calls only) |
| Clinical Scribe / EMR Integration | Direct tool invocation + strict system prompts | Eliminates hallucination, ensures audit trail, complies with documentation standards | Medium (requires credential management) |
| Systematic Review / PubMed Search | mesh_search + mesh_qualifiers + mesh_tree |
Precise descriptor matching, qualifier filtering, tree navigation | Low (free NLM endpoints) |
| Legacy ICD-10 β ICD-11 Migration | Bundled map_icd10_to_icd11 + terminology_diff |
Authoritative WHO transition tables, multi-candidate surfacing, split detection | Low (bundled data, no live API needed) |
| SNOMED CT Integration | Self-hosted Snowstorm + feature flag | Licensing compliance, controlled access, enterprise-grade performance | High (infrastructure + license fees) |
Configuration Template
{
"mcpServers": {
"clinical-terminology-gateway": {
"command": "npx",
"args": ["-y", "medical-terminologies-mcp"],
"env": {
"WHO_CLIENT_ID": "${WHO_AUTH_CLIENT_ID}",
"WHO_CLIENT_SECRET": "${WHO_AUTH_CLIENT_SECRET}",
"ENABLE_SNOMED_TOOLS": "false",
"SNOMED_BASE_URL": "",
"LOG_LEVEL": "info",
"BATCH_CHUNK_SIZE": "50"
},
"timeout": 15000,
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": 1000
}
}
}
}
Quick Start Guide
- Install the server: Run
npx -y medical-terminologies-mcpin your terminal to verify the package resolves correctly. - Configure credentials: Create a
.envfile withWHO_CLIENT_IDandWHO_CLIENT_SECRET. These are free and take under five minutes to generate via the WHO developer portal. - Initialize the client: Load the configuration template into your MCP-compatible client or TypeScript runtime. Call
discoverCapabilities()to verify tool registration. - Execute a test lookup: Run a simple
loinc_searchorrxnorm_searchquery. Confirm the response returns structured data rather than generated text. - Deploy to pipeline: Integrate the client into your CI/CD or application layer. Add chunking logic for batch operations and schedule version drift checks before production rollout.
