at classifies intent and complexity, routing requests to the optimal resource.
- Heuristics/Regex: For deterministic tasks (e.g., format validation).
- Small Fine-Tuned Models: For high-volume, domain-specific classification.
- RAG + Mid-Tier Model: For reasoning over proprietary data.
- Frontier Model: For complex, novel reasoning or low-confidence scenarios.
3. Automated Data Flywheels
The moat is built by capturing implicit feedback (user edits, acceptance/rejection, time-to-approve) and explicit feedback (thumbs up/down) to continuously improve the system without manual annotation overhead.
TypeScript Implementation: Differentiated Agent Pattern
The following architecture demonstrates a production-ready agent that implements routing, context enrichment, and feedback capture.
import { z } from 'zod';
// Domain-specific schema for structured output
const ActionSchema = z.object({
action: z.enum(['suggest', 'execute', 'clarify']),
confidence: z.number().min(0).max(1),
payload: z.any(),
reasoning: z.string().optional(),
});
type Action = z.infer<typeof ActionSchema>;
interface RoutingConfig {
smallModelThreshold: number;
fallbackModel: string;
enableFeedbackCapture: boolean;
}
interface UserContext {
userId: string;
domainData: Record<string, any>;
history: Array<{ input: string; output: Action; feedback?: number }>;
}
export class DifferentiatedAgent {
private router: ModelRouter;
private contextEngine: ContextEngine;
private feedbackStore: FeedbackStore;
private config: RoutingConfig;
constructor(config: RoutingConfig, router: ModelRouter, contextEngine: ContextEngine, feedbackStore: FeedbackStore) {
this.config = config;
this.router = router;
this.contextEngine = contextEngine;
this.feedbackStore = feedbackStore;
}
async process(input: string, context: UserContext): Promise<Action> {
// 1. Enrich context with domain data and history
const enrichedContext = await this.contextEngine.enrich(input, context);
// 2. Route based on complexity and confidence requirements
const route = this.router.determineRoute(input, enrichedContext);
// 3. Execute with appropriate model strategy
let result: Action;
try {
result = await this.executeStrategy(route, input, enrichedContext);
} catch (error) {
// Fallback mechanism for resilience
result = await this.executeStrategy({ model: this.config.fallbackModel, strategy: 'full_rag' }, input, enrichedContext);
}
// 4. Capture implicit feedback signals immediately
if (this.config.enableFeedbackCapture) {
this.feedbackStore.captureImplicit({
userId: context.userId,
input,
result,
timestamp: Date.now(),
sessionContext: enrichedContext.sessionId,
});
}
return result;
}
private async executeStrategy(route: Route, input: string, context: any): Promise<Action> {
switch (route.strategy) {
case 'heuristic':
return this.applyHeuristics(input, context);
case 'fine_tuned':
return this.router.callSmallModel(route.model, input, context);
case 'rag':
const docs = await this.contextEngine.retrieveDocs(input, context.domainData);
return this.router.callRagModel(route.model, input, docs);
case 'full_rag':
const fullDocs = await this.contextEngine.retrieveDocs(input, context.domainData, { topK: 10 });
return this.router.callFrontier(input, fullDocs, context.history);
default:
throw new Error('Unknown route strategy');
}
}
private applyHeuristics(input: string, context: any): Action {
// Example: Deterministic validation or template filling
// This saves 100% of inference cost for trivial tasks
if (input.includes('status_check')) {
return {
action: 'execute',
confidence: 1.0,
payload: { status: context.domainData.currentStatus },
reasoning: 'Heuristic match for status check.'
};
}
throw new Error('Heuristic mismatch');
}
}
// Supporting interfaces for the architecture
interface ModelRouter {
determineRoute(input: string, context: any): Route;
callSmallModel(model: string, input: string, context: any): Promise<Action>;
callRagModel(model: string, input: string, docs: any[]): Promise<Action>;
callFrontier(input: string, docs: any[], history: any[]): Promise<Action>;
}
interface ContextEngine {
enrich(input: string, context: UserContext): Promise<EnrichedContext>;
retrieveDocs(input: string, domainData: any, options?: { topK: number }): Promise<any[]>;
}
interface FeedbackStore {
captureImplicit(feedback: ImplicitFeedback): Promise<void>;
}
interface Route {
model: string;
strategy: 'heuristic' | 'fine_tuned' | 'rag' | 'full_rag';
}
interface EnrichedContext extends UserContext {
sessionId: string;
embedding: number[];
retrievedContext: any[];
}
interface ImplicitFeedback {
userId: string;
input: string;
result: Action;
timestamp: number;
sessionContext: string;
}
Architecture Decisions:
- Structured Output: Enforcing Zod schemas ensures reliability and allows downstream systems to act on AI responses deterministically. This is critical for workflow entanglement.
- Router Pattern: Decouples model selection from business logic. Allows A/B testing models and swapping providers without refactoring.
- Implicit Feedback: Capturing
timestamp, sessionContext, and result enables offline analysis of user behavior (e.g., if a user immediately re-prompts, the first result was poor). This data fuels model fine-tuning and prompt optimization.
- Heuristic Fallback: Prioritizing deterministic logic for trivial tasks drastically reduces costs and latency, improving the user experience for high-frequency actions.
Pitfall Guide
-
Optimizing for Accuracy Over Latency in the Wrong Place
- Mistake: Using a slow frontier model for a task where a small model or heuristic suffices, causing UI lag.
- Fix: Implement latency budgets per workflow step. Use the router to enforce these budgets. If a task requires <200ms, restrict routing to heuristics or cached responses.
-
Ignoring Implicit Feedback Signals
- Mistake: Relying solely on explicit "thumbs up/down" which has low signal volume.
- Fix: Instrument the app to capture implicit signals: copy actions, edit distance between AI output and user submission, time spent reviewing, and abandonment rates. These provide 10x more data volume for improving the flywheel.
-
Building a "Chat" Interface Instead of a Workflow
- Mistake: Defaulting to a chat UI for all AI features. This increases cognitive load and friction.
- Fix: Design AI as a background processor that enhances existing UI elements. Use AI to pre-fill forms, suggest next actions, or generate drafts inline. The interface should guide the user, not require them to prompt.
-
Data Leakage and Privacy Violations
- Mistake: Sending sensitive domain data to third-party APIs without proper sanitization or PII redaction.
- Fix: Implement a
Sanitizer middleware in the context engine. Use on-prem or VPC-based models for sensitive data. Define clear data retention policies. Trust is a differentiator; breaches destroy it instantly.
-
Over-Engineering the Model Layer, Under-Engineering the Data Pipeline
- Mistake: Spending weeks tuning prompts while the RAG retrieval pipeline returns irrelevant chunks.
- Fix: Invest in data quality. Implement chunking strategies tailored to your domain (e.g., semantic chunking for code, hierarchical chunking for docs). Evaluate retrieval quality with recall metrics before optimizing generation.
-
Assuming Model Performance is Static
- Mistake: Setting up a model and never re-evaluating. Models degrade as domain data shifts, or better/cheaper models emerge.
- Fix: Implement an automated evaluation harness. Run nightly regression tests against a golden dataset. Alert on drift. Use shadow deployments to test new models against production traffic safely.
-
Neglecting Cost Attribution
- Mistake: Treating inference cost as a generic infrastructure expense.
- Fix: Tag every request with feature, user segment, and route. Analyze cost per outcome. If a feature costs $5 per user to run but generates $2 in value, it must be optimized or removed.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Volume, Low Risk Tasks (e.g., formatting, classification) | Fine-Tuned Small Model or Heuristic | Maximizes throughput, minimizes latency and cost. Small models can outperform frontier models on narrow tasks. | Decrease (90%+ cost reduction vs. frontier) |
| Complex Reasoning on Proprietary Data | RAG + Mid-Tier Model | Provides accuracy via context retrieval without the cost of training on all data. Balances capability and expense. | Moderate |
| Novel/Edge Case Reasoning | Frontier Model with Fallback | Ensures capability for hard cases. Use only when router confidence is low or complexity is high. | Increase (Use sparingly) |
| High Sensitivity/Compliance Data | VPC/On-Prem Model or No AI | Prevents data leakage. Differentiation through trust and security. | Variable (Infrastructure cost vs. API cost) |
| User-Facing Creative Generation | Frontier Model + Human-in-the-Loop | Quality variance is high. Use frontier for creativity but require user review/validation. | High (Justified by value) |
Configuration Template
Use this configuration to define routing rules and feedback collection in your system. This template supports dynamic adjustment of thresholds based on load and business priorities.
# ai-product-config.yaml
routing:
default_fallback: "gpt-4o-mini"
strategies:
- name: "heuristic_router"
model: "internal_heuristics"
priority: 1
conditions:
- intent: "status_check"
- intent: "format_validation"
- name: "domain_classifier"
model: "mistral-7b-finetuned-v2"
priority: 2
confidence_threshold: 0.85
conditions:
- intent: "ticket_categorization"
- intent: "sentiment_analysis"
- name: "rag_assistant"
model: "claude-3-haiku"
priority: 3
conditions:
- intent: "knowledge_retrieval"
retrieval:
top_k: 5
hybrid_search: true
- name: "complex_reasoning"
model: "gpt-4o"
priority: 4
conditions:
- intent: "code_generation"
- intent: "strategic_planning"
fallback: "rag_assistant"
feedback:
collection:
implicit:
- event: "output_edit"
weight: 0.8
- event: "accept_suggestion"
weight: 1.0
- event: "re_prompt"
weight: -0.5
explicit:
- event: "thumbs_up"
weight: 1.0
- event: "thumbs_down"
weight: -1.0
pipeline:
destination: "s3://data-moat/feedback-raw/"
retention_days: 365
anonymization: true
evaluation:
harness:
dataset: "golden-domain-v1"
metrics:
- "accuracy"
- "latency_p95"
- "cost_per_token"
schedule: "0 2 * * *" # Daily at 2 AM
Quick Start Guide
- Audit Your Data Assets: Identify the top 3 data sources that are unique to your product. These are your potential moats. Ensure they are structured and accessible to your AI pipeline.
- Instrument Feedback: Add implicit feedback capture to your existing AI features. Log user interactions, edits, and acceptance rates. Store this in a dedicated feedback store.
- Deploy a Basic Router: Implement a simple router that intercepts AI requests. Route trivial intents to heuristics or cached responses. Route the rest to your current model. Measure the cost savings immediately.
- Build a RAG Baseline: For one key feature, implement a RAG pipeline using your unique data. Compare the accuracy and user satisfaction against the generic model. Iterate on chunking and retrieval strategies.
- Evaluate and Iterate: Run your evaluation harness against the new RAG system. Quantify the improvement in accuracy and reduction in hallucinations. Use this data to justify further investment in the data flywheel.
By focusing on workflow entanglement, intelligent routing, and automated data flywheels, you transform AI from a cost center and commodity feature into a defensible product moat. The goal is not to build a better chatbot; it is to build a system that becomes indispensable and smarter than any competitor with every user interaction.