nts, and delivers consistent BI metrics alongside unstructured insights.
Core Solution
Building a production-grade compound AI system requires explicit state management, deterministic routing, and layered validation. The architecture below uses a TypeScript-based orchestration layer inspired by LangGraph's state machine paradigm, integrated with Microsoft Fabric's OneLake storage, Delta Parquet formats, and serverless T-SQL endpoints.
Architecture Overview
- Query Router: Classifies incoming requests into semantic, structured, or cached categories using confidence thresholds.
- Hybrid Retriever: Dispatches to either a vector index (for unstructured content) or a T-SQL engine (for relational metrics).
- Semantic Cache: Stores query-response pairs with version-aware TTL to eliminate redundant computation.
- Guardrail Validator: Enforces schema compliance, data governance rules, and output sanitization before returning results.
Implementation: State Machine & Routing Logic
The orchestration layer maintains explicit state transitions. Each query passes through classification, routing, retrieval, validation, and response assembly.
import { StateGraph, END, START } from "@langchain/langgraph";
interface AgentState {
query: string;
classification: 'semantic' | 'structured' | 'cached';
confidence: number;
retrievedData: string | Record<string, unknown>;
validated: boolean;
response: string;
error?: string;
}
const classifyQuery = (state: AgentState): AgentState => {
const numericPattern = /\b(sum|total|average|count|growth|revenue|profit|q[1-4]|ytd)\b/i;
const isStructured = numericPattern.test(state.query);
return {
...state,
classification: isStructured ? 'structured' : 'semantic',
confidence: isStructured ? 0.89 : 0.76
};
};
const routeRetrieval = async (state: AgentState): Promise<AgentState> => {
if (state.classification === 'structured') {
// Delegate to serverless T-SQL endpoint via Fabric
const sqlQuery = `SELECT SUM(revenue) FROM sales WHERE quarter = 'Q3'`;
const result = await executeTSQL(sqlQuery);
return { ...state, retrievedData: result };
}
// Delegate to vector index for unstructured content
const embeddings = await embedText(state.query);
const chunks = await searchOneLakeVectors(embeddings, { topK: 3 });
return { ...state, retrievedData: chunks.map(c => c.text).join('\n') };
};
const validateOutput = (state: AgentState): AgentState => {
if (state.classification === 'structured') {
const data = state.retrievedData as Record<string, unknown>;
const hasRequiredFields = 'sum_revenue' in data && typeof data.sum_revenue === 'number';
return { ...state, validated: hasRequiredFields, error: hasRequiredFields ? undefined : 'Schema validation failed' };
}
return { ...state, validated: true };
};
const graph = new StateGraph<AgentState>({
channels: {
query: { value: null, reducer: (a, b) => b },
classification: { value: null, reducer: (a, b) => b },
confidence: { value: 0, reducer: (a, b) => b },
retrievedData: { value: null, reducer: (a, b) => b },
validated: { value: false, reducer: (a, b) => b },
response: { value: '', reducer: (a, b) => b },
error: { value: undefined, reducer: (a, b) => b }
}
})
.addNode('classify', classifyQuery)
.addNode('route', routeRetrieval)
.addNode('validate', validateOutput)
.addEdge(START, 'classify')
.addEdge('classify', 'route')
.addEdge('route', 'validate')
.addEdge('validate', END);
const compiledGraph = graph.compile();
Architecture Decisions & Rationale
State Machine Over Linear Pipelines: LangGraph-style state machines enforce explicit control flow. Unlike linear chains where failures cascade silently, state machines allow conditional branching, retry logic, and clear error boundaries. This is critical for BI workloads where a failed SQL execution must not corrupt the semantic retrieval path.
Semantic Cache with Version Awareness: Standard caches fail in BI because underlying data changes frequently. The cache layer must track dataset versions (e.g., Delta Parquet transaction logs) and invalidate entries when source tables are updated. This prevents stale metrics from being served while preserving cost savings for repeated analytical queries.
Deterministic Guardrails: Prompt instructions alone cannot enforce data governance. The validation node checks output schemas, verifies numeric ranges, and ensures compliance with corporate data policies before the response reaches the user. This transforms the system from probabilistic to auditable.
OneLake & Delta Parquet Integration: Microsoft Fabric's unified storage eliminates data silos. Delta Parquet provides ACID transactions, schema evolution, and time travel. The vector index and SQL engine both reference the same underlying files, ensuring consistency between unstructured embeddings and structured aggregates.
Pitfall Guide
1. Semantic Drift on Numeric Queries
Explanation: Vector similarity matches linguistic patterns, not mathematical operations. Queries containing "total", "growth", or "average" routed to semantic search return irrelevant document snippets.
Fix: Implement regex or lightweight classifier thresholds that force structured queries to the SQL engine. Never rely on embedding distance for aggregation requests.
2. Cache Staleness in Financial Workloads
Explanation: Caching BI responses without version tracking serves outdated metrics when source data refreshes.
Fix: Tie cache keys to Delta Parquet transaction IDs or table version stamps. Implement TTL policies that align with data refresh schedules (e.g., 15 minutes for real-time dashboards, 24 hours for daily reports).
3. Over-Engineered Routing Logic
Explanation: Creating dozens of classification branches increases latency and maintenance overhead.
Fix: Use a two-tier router: primary classification (semantic vs structured) followed by confidence-based fallback. If confidence drops below 0.75, execute hybrid retrieval and merge results.
4. Guardrail Bypass via Prompt Injection
Explanation: Users or downstream systems may inject instructions that override validation rules.
Fix: Separate prompt construction from output validation. Run guardrails on the final response payload, not the input prompt. Enforce strict JSON schema validation and numeric range checks post-generation.
5. Context Window Bloat from Raw Chunks
Explanation: Feeding unprocessed document chunks into the LLM causes attention dilution and token waste.
Fix: Apply a pre-processing extraction step. Use a lightweight model or rule-based parser to pull key metrics, tables, and summaries before passing content to the main agent.
6. Schema-Agnostic SQL Generation
Explanation: LLM-generated queries reference non-existent columns or incorrect table names, causing runtime failures.
Fix: Inject a schema-aware validation layer. Map user intent to a predefined metric catalog. Validate generated SQL against the lakehouse metadata before execution.
7. Single-Point Failure in Orchestration
Explanation: Tightly coupling routing, retrieval, and validation creates brittle dependencies.
Fix: Decouple components via message queues or event-driven patterns. Implement circuit breakers for SQL endpoints and vector indices. Log routing decisions for auditability.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Ad-hoc BI Analytics | Structured SQL Routing + Semantic Cache | Deterministic aggregation prevents hallucination; cache reduces repeated computation | Low ($0.02-0.04/query) |
| Real-Time Dashboard | Hybrid Retrieval + Fast Cache Invalidation | Balances live metrics with policy context; version-aware TTL ensures freshness | Medium ($0.05-0.08/query) |
| Compliance Audit | Guardrail-Validated SQL + Immutable Logs | Schema enforcement and audit trails satisfy regulatory requirements | Low (compute-heavy, token-light) |
| Unstructured Policy Q&A | Semantic Retrieval + Summarization Pre-processor | Vector search excels at document matching; summarization reduces context bloat | Medium ($0.06-0.10/query) |
| Mixed Query Workloads | Compound Router + Fallback Hybrid | Adapts dynamically to user intent; prevents single-path failure | Medium-High ($0.07-0.12/query) |
Configuration Template
orchestrator:
routing:
semantic_threshold: 0.75
structured_keywords: ["sum", "total", "average", "count", "growth", "q[1-4]", "ytd"]
fallback_mode: hybrid
cache:
ttl_seconds: 900
version_tracking: delta_parquet_transaction_id
max_entries: 50000
guardrails:
schema_validation: strict
numeric_range_check: true
governance_policy: corporate_bi_standard_v2
storage:
lakehouse: fabric_onelake
format: delta_parquet
sql_endpoint: serverless_tsql
vector_index: semantic_search_v1
telemetry:
log_routing_decisions: true
track_cache_hits: true
alert_on_validation_failure: true
Quick Start Guide
- Initialize the State Machine: Deploy the TypeScript orchestration layer with the provided graph definition. Configure the classification node to parse incoming queries against the structured keyword list.
- Connect to Fabric Storage: Point the hybrid retriever to your OneLake Delta Parquet tables. Configure the T-SQL endpoint for structured aggregation and the vector index for unstructured retrieval.
- Deploy Semantic Cache: Set up the cache layer with version-aware invalidation. Align TTL values with your data refresh cadence to prevent stale metric delivery.
- Enable Guardrails: Attach schema validation and numeric range checks to the output node. Test with known BI queries to verify deterministic behavior.
- Validate & Monitor: Run a regression suite covering structured, semantic, and mixed queries. Enable telemetry logging to track routing accuracy, cache performance, and validation pass rates. Iterate thresholds based on production telemetry.