dation and probabilistic inference are decoupled.
Core Solution
The production-ready architecture implements a three-tier pipeline: (1) FHIR resource validation & normalization, (2) async API orchestration with adaptive rate limiting, and (3) LLM routing with clinical guardrails and fallback routing.
1. FHIR Resource Validation & Normalization
Pydantic models enforce strict schema compliance before data enters the inference layer. This prevents malformed payloads from triggering LLM parsing errors or compliance violations.
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import fhir.resources.patient as patient
class ValidatedPatient(BaseModel):
resource_type: str = "Patient"
id: str
name: List[dict]
birth_date: Optional[str] = Field(alias="birthDate")
identifier: Optional[List[dict]] = None
@validator("birth_date")
def validate_iso_date(cls, v):
if v and not re.match(r"^\d{4}(-\d{2}(-\d{2})?)?$", v):
raise ValueError("birthDate must follow FHIR date format (YYYY, YYYY-MM, or YYYY-MM-DD)")
return v
@validator("identifier")
def validate_system_prefix(cls, v):
if v:
for ident in v:
if "system" in ident and not ident["system"].startswith(("http://", "https://", "urn:")):
raise ValueError("Identifier system must use a valid URI scheme")
return v
2. Async API Orchestration with Circuit Breaking
Health APIs enforce strict rate limits. The orchestrator uses exponential backoff, token bucket rate limiting, and circuit breaking to maintain pipeline stability.
import asyncio
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
class FHIRClient:
def __init__(self, base_url: str, api_key: str):
self.client = httpx.AsyncClient(
base_url=base_url,
headers={"Authorization": f"Bearer {api_key}", "Accept": "application/fhir+json"}
)
self.rate_limiter = asyncio.Semaphore(10) # Adaptive concurrency
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=30),
retry=retry_if_exception_type((httpx.TimeoutException, httpx.HTTPStatusError))
)
async def fetch_patient(self, patient_id: str) -> dict:
async with self.rate_limiter:
response = await self.client.get(f"/Patient/{patient_id}")
response.raise_for_status()
return response.json()
3. LLM Routing with Clinical Guardrails
Deterministic rules intercept low-confidence or high-risk queries, routing them to fallback clinical logic or human review. Guardrails enforce PHI redaction and output schema constraints.
import instructor
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
class ClinicalResponse(BaseModel):
summary: str = Field(description="Concise clinical summary, max 3 sentences")
confidence: float = Field(ge=0.0, le=1.0)
requires_review: bool
pii_detected: bool = False
client = instructor.patch(AsyncOpenAI())
async def route_clinical_query(fhir_data: dict, query: str) -> ClinicalResponse:
# PHI redaction step omitted for brevity
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a clinical AI strategist. Return structured responses only."},
{"role": "user", "content": f"FHIR Context: {fhir_data}\nQuery: {query}"}
],
response_model=ClinicalResponse,
temperature=0.2
)
if response.confidence < 0.75 or response.pii_detected:
response.requires_review = True
return response
Pitfall Guide
- Ignoring FHIR Versioning & Profile Constraints: FHIR R4 and R5 differ in resource structure and mandatory fields. Deploying against R4 schemas while the EHR vendor serves R5 payloads causes silent validation failures. Always pin to explicit FHIR versions and validate against implementation guides (IGs).
- Bypassing Clinical Guardrails for Latency: Removing confidence thresholds or PHI redaction to shave milliseconds increases hallucination risk and HIPAA violation exposure. Guardrails must execute before LLM invocation, not post-hoc.
- Mishandling PHI Data Residency & Audit Trails: Storing or routing protected health information through non-compliant regions or third-party logging services triggers regulatory breaches. Implement region-locked inference, zero-retention logging, and immutable audit trails for every query.
- Over-Optimizing LLM Prompts Without Ground Truth Validation: Tuning prompts against synthetic datasets creates false confidence. Always validate against de-identified real-world EHR extracts and measure F1 against clinician-annotated benchmarks.
- Neglecting API Rate Limiting & Circuit Breaking: Health APIs enforce strict concurrency limits. Without token buckets and circuit breakers, burst traffic causes cascade failures and 429 loops that degrade downstream AI services.
- Skipping Fallback to Deterministic Rules: LLMs should augment, not replace, clinical logic. Always implement rule-based fallbacks (e.g., SNOMED-CT mapping, dosage calculators) for high-stakes queries where probabilistic output is unacceptable.
- Treating "Health Intelligence" as Generic NLP: Clinical text contains nested abbreviations, temporal references, and negation patterns that break standard tokenization. Use domain-specific embeddings and clinical NLP pipelines (e.g., MedSpaCy, scispaCy) before LLM routing.
Deliverables
- Architecture Blueprint: Complete system diagram detailing FHIR validation layers, async orchestration topology, LLM routing logic, and compliance boundaries. Includes data flow annotations for PHI handling and audit trail placement.
- Deployment & Compliance Checklist: 42-point verification matrix covering FHIR version pinning, rate-limit configuration, guardrail thresholds, PHI redaction validation, region-locked inference, and clinician review SLAs.
- Configuration Templates: Production-ready YAML/JSON manifests for API routing rules, circuit-breaking parameters, guardrail confidence thresholds, and FHIR profile mappings. Includes environment-specific overrides for staging vs. production compliance tiers.