and action dispatch. This allows product teams to modify retention logic without redeploying core services.
Architecture Decisions and Rationale
- Event Ingestion: Use a message broker (Kafka, Redis Streams, or AWS Kinesis) to decouple event producers from consumers. This ensures no data loss during traffic spikes.
- Stream Processing: Implement a stateful stream processor (e.g., Apache Flink, KSQL, or a custom Node.js stream handler) to evaluate rules against user state in real-time.
- User State Store: Maintain a low-latency store (Redis or DynamoDB) for user profiles and retention scores. This store must support atomic updates to prevent race conditions.
- Action Dispatcher: A service that handles the execution of retention actions (notifications, UI flags, discounts). This service must enforce rate limiting and preference checks.
Technical Implementation
The following TypeScript implementation demonstrates a modular retention engine. It includes an event schema, a rule evaluation engine, and a dispatcher with rate limiting.
1. Event Schema and Type Safety
Define strict schemas to ensure data integrity across the pipeline.
// retention-events.ts
export interface RetentionEvent {
id: string;
userId: string;
type: string;
timestamp: Date;
payload: Record<string, unknown>;
sessionId: string;
}
// Type guards for specific event types
export interface FeatureAdoptedEvent extends RetentionEvent {
type: 'feature_adopted';
payload: {
featureId: string;
timeToFirstAction: number; // ms
};
}
export interface SessionAbandonedEvent extends RetentionEvent {
type: 'session_abandoned';
payload: {
lastPage: string;
duration: number; // ms
errorCount: number;
};
}
export function isFeatureAdopted(event: RetentionEvent): event is FeatureAdoptedEvent {
return event.type === 'feature_adopted';
}
2. Retention Rule Engine
A rule engine that evaluates conditions against user state and triggers actions.
// rule-engine.ts
import { RetentionEvent, FeatureAdoptedEvent } from './retention-events';
import { UserStateStore } from './user-store';
import { ActionDispatcher } from './action-dispatcher';
export interface RetentionRule {
id: string;
name: string;
condition: (event: RetentionEvent, state: any) => boolean;
action: (userId: string, state: any) => Promise<void>;
cooldown: number; // ms
}
export class RetentionEngine {
constructor(
private store: UserStateStore,
private dispatcher: ActionDispatcher,
private rules: RetentionRule[]
) {}
async processEvent(event: RetentionEvent): Promise<void> {
const userState = await this.store.getUserState(event.userId);
// Update state based on event
await this.store.updateUserState(event.userId, this.deriveStateUpdate(event));
// Evaluate rules
for (const rule of this.rules) {
if (rule.condition(event, userState)) {
const lastTriggered = await this.store.getLastTriggered(rule.id, event.userId);
const now = Date.now();
if (!lastTriggered || (now - lastTriggered > rule.cooldown)) {
await rule.action(event.userId, userState);
await this.store.recordTrigger(rule.id, event.userId, now);
}
}
}
}
private deriveStateUpdate(event: RetentionEvent): Partial<any> {
// Logic to transform event into state delta
if (event.type === 'session_abandoned') {
return { consecutiveAbandonments: (event as any).payload.errorCount > 0 ? 1 : 0 };
}
return {};
}
}
3. Example Rule: Onboarding Friction Detection
A concrete rule that detects users abandoning sessions due to errors and triggers a support intervention.
// rules/onboarding-friction.ts
import { RetentionRule, SessionAbandonedEvent } from '../rule-engine';
import { RetentionEvent } from '../retention-events';
export const onboardingFrictionRule: RetentionRule = {
id: 'rule-onboarding-friction',
name: 'Onboarding Error Intervention',
condition: (event: RetentionEvent, state: any) => {
if (event.type !== 'session_abandoned') return false;
const sessEvent = event as SessionAbandonedEvent;
// Trigger if user is in onboarding, has errors, and is a new user
return (
state.onboardingStage === 'active' &&
sessEvent.payload.errorCount > 2 &&
state.accountAgeHours < 24
);
},
action: async (userId: string, state: any) => {
// Dispatch a contextual in-app message or email
await dispatcher.send(userId, {
type: 'in_app_message',
template: 'onboarding_help',
data: { stage: state.onboardingStage }
});
},
cooldown: 3600000 // 1 hour cooldown to prevent spam
};
4. Action Dispatcher with Rate Limiting
Prevents notification fatigue by enforcing global and per-user limits.
// action-dispatcher.ts
import { RedisClientType } from 'redis';
export class ActionDispatcher {
constructor(private redis: RedisClientType) {}
async send(userId: string, action: any): Promise<boolean> {
// Check global rate limit
const globalKey = `rl:global:${action.type}`;
const globalCount = await this.redis.incr(globalKey);
if (globalCount === 1) {
await this.redis.expire(globalKey, 60); // Reset every minute
}
if (globalCount > 1000) return false; // Global cap
// Check user-specific rate limit
const userKey = `rl:user:${userId}:${action.type}`;
const userCount = await this.redis.incr(userKey);
if (userCount === 1) {
await this.redis.expire(userKey, 3600); // Reset every hour
}
if (userCount > 3) return false; // User cap: max 3 emails/hour
// Execute action (e.g., push notification, email)
await this.executeAction(userId, action);
return true;
}
private async executeAction(userId: string, action: any): Promise<void> {
// Integration with notification providers
console.log(`Dispatching ${action.type} to ${userId}`, action);
}
}
Architecture Rationale
- Idempotency: The
recordTrigger method ensures that if an event is replayed or processed twice, the action is not duplicated.
- Decoupling: Rules are defined separately from the engine. Product teams can add new rules by updating the configuration without touching the core service code.
- Scalability: The Redis-backed rate limiter allows the dispatcher to scale horizontally while maintaining consistent throttling logic.
Pitfall Guide
Common Mistakes and Best Practices
-
High Latency Feedback Loops
- Mistake: Processing retention events in nightly cron jobs.
- Impact: Interventions arrive too late to influence behavior.
- Best Practice: Implement stream processing for critical retention signals. Use WebSockets or Server-Sent Events for real-time in-app updates.
-
Notification Fatigue
- Mistake: Triggering actions based on single events without frequency capping.
- Impact: Users mute notifications or uninstall the app.
- Best Practice: Implement multi-tier rate limiting (global, per-user, per-channel). Use suppression lists based on user preferences.
-
Schema Drift and Data Quality
- Mistake: Allowing event payloads to change without versioning, breaking downstream rules.
- Impact: Rules fail silently, or worse, trigger incorrect actions.
- Best Practice: Enforce schema validation at ingestion. Use schema registries. Version events and handle legacy schemas gracefully.
-
Coupling Retention Logic to Business Code
- Mistake: Hardcoding retention triggers inside API handlers.
- Impact: Retention logic becomes untestable and requires full deployments to change.
- Best Practice: Emit events and let the retention engine handle logic. Keep business services stateless regarding retention.
-
Ignoring the Cold Start Problem
- Mistake: Applying churn prediction models to new users without sufficient data.
- Impact: False positives lead to annoying interventions for engaged new users.
- Best Practice: Implement a "warm-up" period for new users. Use heuristic-based rules for onboarding and switch to ML models only after sufficient signal accumulation.
-
Lack of Experimentation Framework
- Mistake: Rolling out retention campaigns to 100% of users without A/B testing.
- Impact: Inability to measure lift; risk of negative impact on retention.
- Best Practice: Integrate retention actions with an experimentation platform. Always run holdout groups. Measure incremental lift, not just correlation.
-
Privacy Violations in Behavioral Tracking
- Mistake: Storing PII in analytics events or tracking sensitive behaviors without consent.
- Impact: GDPR/CCPA violations, legal risk, loss of trust.
- Best Practice: Anonymize PII at ingestion. Implement consent management. Ensure retention rules only operate on pseudonymized IDs.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Early Stage Startup | Polling-Based + Batch | Low engineering overhead; sufficient for small user bases; rapid iteration. | Low infrastructure cost; higher manual effort. |
| Scaling SaaS Product | Real-Time Stream | Handles volume; enables contextual interventions; reduces churn via low latency. | Medium infrastructure cost; requires skilled engineering. |
| High-Churn B2C App | ML Predictive + Real-Time | Proactive retention; identifies subtle churn signals; maximizes LTV. | High infrastructure cost; complex ML ops; highest ROI potential. |
| Regulated Industry | Batch with Privacy Shield | Ensures data governance; allows audit trails; minimizes real-time PII exposure. | Medium cost; slower retention response; compliance overhead. |
Configuration Template
A YAML-based configuration for defining retention rules without code changes. This can be loaded by the rule engine at runtime.
# retention-rules.yaml
rules:
- id: rule-onboarding-error
name: "Onboarding Error Intervention"
trigger_event: "session_abandoned"
condition:
and:
- path: "payload.errorCount"
operator: "gt"
value: 2
- path: "state.onboardingStage"
operator: "eq"
value: "active"
- path: "state.accountAgeHours"
operator: "lt"
value: 24
action:
type: "in_app_message"
template: "onboarding_help"
cooldown_ms: 3600000
- id: rule-weekly-engagement
name: "Weekly Re-engagement"
trigger_event: "timer_weekly"
condition:
and:
- path: "state.lastActiveDays"
operator: "gte"
value: 7
- path: "state.plan"
operator: "eq"
value: "free"
action:
type: "email"
template: "weekly_digest_promo"
cooldown_ms: 604800000
rate_limits:
global:
email: 1000
push: 5000
user:
email: 3
push: 10
window_minutes: 60
Quick Start Guide
-
Initialize Schema Registry: Define your retention events in a schema file and register them with your event broker. Ensure all services validate against this schema.
npm install @codcompass/retention-schema
# Configure schema registry URL in .env
-
Deploy Rule Engine Service: Use the provided TypeScript template to deploy the retention engine. Connect it to your message broker and Redis store.
docker-compose up -d retention-engine redis
-
Load Configuration: Upload the retention-rules.yaml to your configuration store or load it via the admin API. Verify rules are loaded by checking the engine logs.
-
Simulate Events: Use a CLI tool or test script to emit sample events and verify that rules trigger actions correctly. Check Redis for cooldown entries.
npx retention-cli simulate --event session_abandoned --user user_123
-
Monitor and Iterate: Set up a dashboard for rule triggers, action dispatch success rates, and retention lift. Refine conditions based on performance data.
Retention is a system, not a feature. By engineering your retention infrastructure with real-time capabilities, strict schema governance, and decoupled logic, you transform user engagement from a reactive marketing activity into a predictable, scalable engineering outcome.