Decisions
- Event-Driven Ingestion: Growth signals must be captured in real-time. Use a high-throughput event bus (e.g., Kafka, Redpanda) to stream user interactions to the inference layer.
- Real-Time Feature Store: User context must be available with sub-100ms latency. Implement a low-latency feature store (e.g., Redis, DynamoDB) populated by stream processing.
- Multi-Armed Bandit (MAB) Strategy: Replace A/B testing with Thompson Sampling or UCB algorithms. MABs dynamically allocate traffic to winning variants, maximizing reward while exploring new options.
- LLM for Dynamic Personalization: Use LLMs not for generation alone, but for contextual routing. The LLM decides the optimal next step based on user sentiment, intent, and historical behavior.
- Guardrails & Fallbacks: Production AI requires deterministic fallbacks. If the AI confidence score is low or latency exceeds SLA, the system must revert to a heuristic or cached baseline.
Step-by-Step Implementation
1. Define the Loop
Identify the trigger, action, variable, and reward.
- Trigger: User visits pricing page.
- Action: AI selects pricing card layout and incentive.
- Variable: Layout type, discount amount, social proof text.
- Reward: Click-through to checkout.
2. Instrumentation
Ensure every user interaction emits a structured event with a user_id, session_id, variant_id, and outcome.
3. Inference Service
Build a lightweight API that queries the feature store, runs the MAB or LLM inference, and returns the decision.
4. Feedback Integration
Asynchronously process outcomes to update model weights. If a variant converts, its probability of selection increases.
Code Example: AI Growth Decision Engine
The following Python/FastAPI example demonstrates a production-ready growth decision endpoint using a Multi-Armed Bandit and LLM fallback.
import fastapi
from pydantic import BaseModel
import asyncio
from typing import Dict, Any
app = fastapi.FastAPI()
# Mock services for production context
class FeatureStore:
async def get_user_context(self, user_id: str) -> Dict[str, Any]:
# In production: Query Redis/DynamoDB with <20ms latency
return {"tenure": 5, "last_page": "pricing", "intent_score": 0.85}
class GrowthModel:
async def predict_variant(self, context: Dict, bandit_state: Dict) -> Dict:
# Thompson Sampling implementation
# Returns {'variant': 'A', 'confidence': 0.92, 'reasoning': 'High intent user responds to urgency'}
pass
class LLMRouter:
async def route(self, context: Dict) -> Dict:
# LLM decides dynamic incentive based on context
pass
feature_store = FeatureStore()
growth_model = GrowthModel()
llm_router = LLMRouter()
class GrowthRequest(BaseModel):
user_id: str
event_type: str
session_id: str
class GrowthResponse(BaseModel):
variant: str
payload: Dict
latency_ms: float
model_source: str
@app.post("/growth/decide", response_model=GrowthResponse)
async def growth_decision(request: GrowthRequest):
start_time = asyncio.get_event_loop().time()
# 1. Fetch Context
context = await feature_store.get_user_context(request.user_id)
# 2. AI Decision with Timeout Guardrail
try:
# Run inference with strict timeout
decision = await asyncio.wait_for(
growth_model.predict_variant(context, bandit_state),
timeout=0.05 # 50ms SLA
)
if decision["confidence"] < 0.7:
# Fallback to LLM for nuanced decision if model is uncertain
llm_decision = await llm_router.route(context)
decision = llm_decision
source = "llm_fallback"
else:
source = "bandit"
except asyncio.TimeoutError:
# Deterministic fallback on latency breach
decision = {"variant": "control", "payload": {}}
source = "fallback"
# 3. Calculate Latency
latency = (asyncio.get_event_loop().time() - start_time) * 1000
# 4. Log for Async Feedback Loop
await log_growth_event(request.user_id, request.event_type, decision, source, latency)
return GrowthResponse(
variant=decision["variant"],
payload=decision.get("payload", {}),
latency_ms=latency,
model_source=source
)
async def log_growth_event(*args):
# Fire-and-forget logging to event bus
pass
Architecture Notes:
- Latency SLA: The
asyncio.wait_for ensures AI inference never blocks the user experience. Growth decisions must be faster than page render times.
- Confidence Thresholds: Low-confidence predictions trigger a fallback. This prevents AI hallucinations or poor recommendations from harming conversion.
- Source Tracking: Logging
model_source allows analysis of whether LLMs add value over classical models, helping manage token costs.
Pitfall Guide: 7 Engineering Traps
- Hallucination in Critical Paths: Using LLMs to generate pricing or legal text without deterministic constraints. Mitigation: Use LLMs for routing/selection, not generation of constrained data. Validate outputs against a schema.
- Latency-Induced Churn: AI inference adding >200ms to page load significantly drops conversion. Mitigation: Edge inference, aggressive caching, and pre-computation of user segments.
- Reward Misalignment: Optimizing for clicks instead of revenue. Mitigation: Define the reward function carefully. Use proxy metrics only if validated against long-term LTV.
- Context Window Exhaustion: Feeding raw user history to LLMs inflates costs and latency. Mitigation: Summarize history, use vector embeddings for retrieval, and limit context to relevant signals.
- Data Leakage: Training models on future data or including target variables in features. Mitigation: Implement strict temporal splits in training pipelines and feature validation.
- Cost Blowout: Unbounded token usage during traffic spikes. Mitigation: Implement token budgeting, caching responses for identical contexts, and rate limiting.
- Ignoring Privacy: Sending PII to third-party AI APIs without anonymization. Mitigation: Hash identifiers, strip PII before inference, and use on-prem models for sensitive data.
Production Bundle
Action Checklist
Decision Matrix
| Use Case | Recommended Approach | Latency Req | Cost Profile | Complexity |
|---|
| Dynamic Pricing | Predictive ML + Rules | <50ms | Low | Medium |
| Personalized Onboarding | MAB + LLM Routing | <100ms | Medium | High |
| Churn Prediction | Batch ML + Trigger | N/A | Low | Low |
| Content Generation | LLM with Cache | <500ms | High | Medium |
| Support Triage | Classification Model | <200ms | Low | Low |
Guidance: Use Predictive ML for structured decisions (pricing, scoring). Use LLMs for unstructured context and routing. Always cache responses where user context is identical to reduce cost.
Configuration Template
Use this YAML configuration to manage growth experiments and AI parameters in a declarative manner.
growth_engine:
version: "1.2.0"
global:
latency_sla_ms: 80
fallback_variant: "control"
confidence_threshold: 0.75
experiments:
- id: "pricing_dynamic_v1"
type: "thompson_sampling"
variants:
- id: "A"
weight: 1.0
payload: { layout: "standard", incentive: null }
- id: "B"
weight: 1.0
payload: { layout: "urgency", incentive: "10%" }
reward_function: "checkout_click"
guardrails:
max_incentive_discount: "15%"
allowed_layouts: ["standard", "urgency", "comparison"]
- id: "onboarding_llm_v1"
type: "llm_router"
model: "gpt-4-mini"
prompt_template: "onboarding_router_v1.txt"
cache_ttl_seconds: 300
fallback: "rule_based_onboarding"
Quick Start Guide
- Initialize SDK: Install the growth engine SDK and configure API keys for your inference provider.
pip install codcompass-growth-sdk
growth-cli init --project my-app
- Configure Events: Add the snippet to your frontend/backend to emit growth events.
// Frontend example
growth.track('pricing_view', { user_id: '123', session_id: 'abc' });
- Deploy Decision Endpoint: Spin up the inference service using the provided Docker template.
docker-compose up -d growth-engine
- Monitor Dashboard: Access the growth dashboard to view real-time variant performance, latency metrics, and cost analysis. Adjust weights or thresholds via the UI or config file.
Editor's Note: Growth hacking with AI is not about replacing human intuition; it is about scaling it. The engineers who win will be those who treat growth as a continuous, algorithmic optimization problem, building systems that learn faster than the competition can react. Focus on architecture, guardrails, and data quality. The lift will follow.