uttons have low engagement (<2%). You must capture implicit signals: edit distance (did the user modify the output?), copy rate, acceptance rate, and time-to-next-action.
3. Self-Correction Loops: Before returning a response to the user, the system should perform a lightweight validation pass. If confidence is low or validation fails, trigger a regeneration or fallback strategy without user intervention.
4. Personalization via Vector Context: Retention increases when the AI "remembers" user preferences. Implement a user-specific vector store for long-term memory, updating embeddings based on successful interactions.
Step-by-Step Implementation
1. Define Retention-Centric Metrics
Move beyond standard analytics. Implement a RetentionScore calculator.
// src/metrics/retention.ts
export interface InteractionMetrics {
taskId: string;
userId: string;
latencyMs: number;
cost: number;
model: string;
confidence: number;
outcome: 'success' | 'failure' | 'abandoned';
userEdits: boolean;
timeToNextActionMs: number;
}
export class RetentionAnalyzer {
/**
* Calculates a weighted score indicating the likelihood of user retention
* based on the interaction quality.
*/
calculateRetentionScore(metrics: InteractionMetrics): number {
let score = 0;
// Outcome is the strongest predictor
if (metrics.outcome === 'success') score += 50;
if (metrics.outcome === 'failure') score -= 30;
// User edits indicate dissatisfaction even on "success"
if (metrics.userEdits) score -= 15;
// Latency penalty (non-linear)
if (metrics.latencyMs > 3000) score -= 10;
if (metrics.latencyMs > 5000) score -= 20;
// Confidence alignment
if (metrics.confidence < 0.6) score -= 10;
// Bonus for low time-to-next-action (flow state)
if (metrics.timeToNextActionMs < 2000 && metrics.outcome === 'success') {
score += 10;
}
return Math.max(0, Math.min(100, score));
}
}
2. Implement the Adaptive Router
The router intercepts requests, evaluates context, and selects the optimal model strategy.
// src/ai/router.ts
import { LLMProvider } from './providers';
import { ConfidenceEvaluator } from './evaluation';
import { FeedbackCollector } from './feedback';
export interface RouterConfig {
fallbackModel: string;
maxLatencyMs: number;
confidenceThreshold: number;
enableSelfCorrection: boolean;
}
export class AdaptiveRouter {
constructor(
private config: RouterConfig,
private providers: Record<string, LLMProvider>,
private evaluator: ConfidenceEvaluator,
private feedback: FeedbackCollector
) {}
async routeRequest(prompt: string, context: any): Promise<string> {
const startTime = Date.now();
// 1. Classify complexity and stakes
const complexity = await this.classifyComplexity(prompt);
const model = this.selectModel(complexity);
try {
// 2. Execute with timeout
const response = await this.executeWithTimeout(model, prompt, context);
// 3. Evaluate confidence
const confidence = await this.evaluator.assess(response, prompt);
// 4. Handle low confidence
if (confidence < this.config.confidenceThreshold) {
if (this.config.enableSelfCorrection) {
return this.handleLowConfidence(prompt, context, response);
}
// Log implicit failure signal
this.feedback.recordImplicitSignal({ type: 'low_confidence', model });
}
// 5. Record metrics for retention analysis
this.recordMetrics({
model,
latency: Date.now() - startTime,
confidence,
outcome: 'success'
});
return response.content;
} catch (error) {
// 6. Fallback strategy
return this.handleFailure(error, prompt, context);
}
}
private async handleLowConfidence(prompt: string, context: any, initialResponse: any) {
// Strategy: Retry with higher capability model or add constraints
const retryModel = this.getHigherCapabilityModel();
const refinedPrompt = `${prompt}\n\nConstraints: Ensure factual accuracy. If unsure, state limitations.`;
const retryResponse = await this.providers[retryModel].complete(refinedPrompt, context);
this.feedback.recordImplicitSignal({
type: 'self_correction_triggered',
initialModel: initialResponse.model,
retryModel
});
return retryResponse.content;
}
}
3. Build the Feedback Loop
Retention improves when the system learns from user behavior.
// src/feedback/collector.ts
export class FeedbackCollector {
// In-memory buffer for high-throughput implicit signals
private signalBuffer: any[] = [];
private flushInterval = 5000; // ms
constructor(private storage: FeedbackStorage) {
setInterval(() => this.flush(), this.flushInterval);
}
recordImplicitSignal(signal: any) {
this.signalBuffer.push({
timestamp: Date.now(),
...signal
});
}
recordExplicitFeedback(userId: string, messageId: string, rating: number, comment?: string) {
this.storage.saveExplicit({ userId, messageId, rating, comment });
// Trigger immediate model update if rating is critical
if (rating <= 1) {
this.triggerRecoveryFlow(userId, messageId);
}
}
private async flush() {
if (this.signalBuffer.length === 0) return;
const batch = [...this.signalBuffer];
this.signalBuffer = [];
await this.storage.saveBatch(batch);
// Update routing weights based on recent performance
await this.updateRoutingWeights(batch);
}
private async triggerRecoveryFlow(userId: string, messageId: string) {
// Notify product team or trigger automated follow-up
// E.g., "We noticed a bad response. Here's a corrected version."
console.log(`Recovery triggered for user ${userId}, message ${messageId}`);
}
}
4. Architecture Rationale
- Why TypeScript? Type safety is critical in the retention layer where data structures flow between evaluation, routing, and storage. Interfaces prevent schema drift in feedback signals.
- Why Async Buffering? Feedback collection must not block the user response. Implicit signals are batched to reduce storage I/O and latency impact.
- Why Self-Correction? Users tolerate a slightly longer wait if the output is correct, rather than a fast incorrect response. Self-correction shifts latency from the user's perception to the system's internal processing, improving perceived reliability.
Pitfall Guide
1. Optimizing for Accuracy Over Consistency
Mistake: Focusing on improving average accuracy metrics while ignoring variance.
Impact: Users encounter unpredictable quality. A model that is 90% accurate but fails catastrophically on 10% of edge cases will churn users faster than a model that is 80% accurate but consistent.
Best Practice: Monitor Tail Latency of Quality. Implement guardrails that catch edge cases and provide safe fallbacks rather than risky guesses.
2. Ignoring Latency Jitter
Mistake: Optimizing for average latency while allowing P99 spikes.
Impact: AI interactions feel conversational. A 10-second spike breaks the flow and signals system instability. Users attribute jitter to "broken" AI.
Best Practice: Implement Progressive Streaming with fallback text. If the model is slow, stream a placeholder or partial response to maintain engagement. Set hard timeouts and trigger fallbacks at P95 thresholds.
3. The "Black Box" Feedback Gap
Mistake: Relying solely on explicit thumbs-up/down buttons.
Impact: Feedback volume is too low to drive meaningful improvements. You miss critical signals like user edits, which indicate dissatisfaction even when the user accepts the output.
Best Practice: Track Implicit Signals: edit distance, copy-paste frequency, time-to-next-action, and prompt rephrasing. These are high-fidelity indicators of user satisfaction.
4. Static Context Windows
Mistake: Sending the full conversation history to every request regardless of relevance.
Impact: Increased cost, higher latency, and context dilution leading to hallucinations. Retention suffers as the AI forgets recent instructions or mixes up topics.
Best Practice: Implement Dynamic Context Truncation. Use a relevance scorer to select only the most pertinent turns for the current query. Summarize older turns when necessary.
5. Cost-Driven Model Downgrades
Mistake: Automatically routing to cheaper models to save costs without quality checks.
Impact: Short-term cost savings lead to long-term retention loss. The CAC required to replace churned users far exceeds the token savings.
Best Practice: Use Value-Based Routing. Route based on the business value of the request. High-value workflows (e.g., generating code for production) always use high-capability models. Low-value workflows (e.g., brainstorming tags) can use cheaper models.
6. Lack of Explainability
Mistake: Providing answers without sources or reasoning for complex queries.
Impact: Users cannot verify correctness, leading to distrust. In professional workflows, unverifiable AI output is unusable.
Best Practice: Implement Citation and Reasoning Display. For RAG-based responses, always show sources. For complex reasoning, offer an optional "Show thought process" toggle to build trust.
7. Cold Start Personalization
Mistake: Treating all users identically, ignoring user-specific patterns.
Impact: The AI feels generic. Retention drops because the product doesn't adapt to the user's domain or style.
Best Practice: Build a User Preference Vector. Store embeddings of successful interactions per user. Use these to personalize prompts (e.g., "User prefers concise code comments"). Update vectors continuously based on feedback.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Volume, Low Stakes (e.g., brainstorming) | Static cheap model + Implicit feedback only | Retention driven by speed and volume; quality variance tolerated. | Low |
| Low Volume, High Stakes (e.g., legal analysis) | High-capability model + Citations + Self-Correction | Trust is paramount; errors cause immediate churn. Cost is secondary. | High |
| Real-time Chat Interface | Streaming + P95 Timeout Fallback + Dynamic Context | Latency sensitivity is extreme; jitter kills retention. | Medium |
| Batch Processing / Async | Confidence Routing + Retry Logic + User Review Queue | Users can wait; focus on accuracy and verifiability. | Medium |
| Personalized Assistant | User Vector Memory + Preference Tuning | Retention relies on adaptation to user style over time. | Medium-High |
Configuration Template
Use this YAML configuration to define retention policies and routing rules.
# retention-config.yaml
retention:
metrics:
enabled: true
implicit_signals:
- edit_distance
- copy_rate
- time_to_next_action
explicit_feedback:
threshold: 2 # Only log ratings <= 2 for immediate action
routing:
strategies:
- name: "default"
model: "gpt-4o-mini"
confidence_threshold: 0.7
max_latency_ms: 2000
fallback_model: "gpt-4o"
fallback_on: ["low_confidence", "timeout"]
- name: "high_stakes"
model: "claude-3-opus"
confidence_threshold: 0.85
max_latency_ms: 4000
fallback_model: "claude-3-opus-retry"
fallback_on: ["low_confidence"]
self_correction: true
max_retries: 2
context:
max_tokens: 4000
strategy: "dynamic_relevance"
summary_threshold: 10000
personalization:
enabled: true
vector_store: "pgvector"
update_frequency: "on_success"
decay_rate: 0.95 # Memory decay factor
Quick Start Guide
- Initialize Retention SDK:
npm install @codcompass/ai-retention
- Configure Router:
Create
retention.config.ts using the template above. Define your models and thresholds.
import { AdaptiveRouter } from '@codcompass/ai-retention';
const router = new AdaptiveRouter(config, providers, evaluator, feedback);
- Wrap AI Calls:
Replace direct model calls with the router.
// Before
const response = await openai.chat.completions.create({ ... });
// After
const response = await router.routeRequest(userPrompt, context);
- Deploy Feedback Hook:
Add the feedback collector to your frontend to capture implicit signals.
import { FeedbackCollector } from '@codcompass/ai-retention';
const feedback = new FeedbackCollector();
feedback.on('edit', (data) => feedback.recordImplicitSignal(data));
- Monitor Retention Score:
Query the retention analytics endpoint to view your
RetentionScore distribution and identify drift.
curl https://api.yourdomain.com/analytics/retention/daily
Category: cc20-1-4-ai-productization