nsistency across sessions and devices.
3. Event Decoupling: Experiment exposure and conversion events must be decoupled. The application should emit events to a stream; the analysis engine correlates exposure with conversion asynchronously.
4. Server-First Routing: For critical user flows and latency-sensitive applications, allocation should occur at the edge or server to prevent client-side re-renders and layout shifts.
Technical Implementation (TypeScript)
The following implementation demonstrates a type-safe, deterministic allocation engine with built-in SRM monitoring hooks.
1. Experiment Schema and Configuration
Define a strict schema for experiment configuration to enforce type safety and validation.
export type ExperimentStatus = 'draft' | 'running' | 'stopped' | 'archived';
export interface ExperimentVariant {
id: string;
weight: number; // 0.0 to 1.0
metadata?: Record<string, unknown>;
}
export interface ExperimentConfig {
id: string;
name: string;
status: ExperimentStatus;
targetId: 'user' | 'device' | 'session';
variants: ExperimentVariant[];
guardrailMetrics: string[];
createdAt: string;
updatedAt: string;
}
// Validation function to ensure weights sum to 1.0
export function validateConfig(config: ExperimentConfig): boolean {
if (config.status !== 'running') return false;
const totalWeight = config.variants.reduce((sum, v) => sum + v.weight, 0);
return Math.abs(totalWeight - 1.0) < 0.0001;
}
2. Allocation Engine
Implement a consistent hashing algorithm for traffic allocation. This ensures that the same user always sees the same variant, preventing "flipping" which corrupts data.
import { createHash } from 'crypto';
export class AllocationEngine {
private configs: Map<string, ExperimentConfig> = new Map();
loadConfig(config: ExperimentConfig): void {
if (!validateConfig(config)) {
throw new Error(`Invalid configuration for experiment ${config.id}`);
}
this.configs.set(config.id, config);
}
/**
* Allocates a user to a variant using deterministic hashing.
* Uses a salt to prevent cross-experiment correlation attacks.
*/
allocate(experimentId: string, targetValue: string): string | null {
const config = this.configs.get(experimentId);
if (!config || config.status !== 'running') return null;
// Create a stable hash for the allocation
const hashInput = `${experimentId}:${targetValue}`;
const hash = createHash('sha256').update(hashInput).digest('hex');
// Convert first 8 hex chars to a number between 0 and 1
const hashValue = parseInt(hash.substring(0, 8), 16) / 0xFFFFFFFF;
let cumulativeWeight = 0;
for (const variant of config.variants) {
cumulativeWeight += variant.weight;
if (hashValue < cumulativeWeight) {
return variant.id;
}
}
// Fallback to control if floating point issues occur
return config.variants[0].id;
}
}
3. React Integration Hook
Provide a React hook that abstracts allocation and exposure tracking. This hook integrates with the allocation engine and emits exposure events.
import { useState, useEffect, useCallback } from 'react';
interface UseExperimentResult<T> {
variant: T | null;
isLoading: boolean;
trackExposure: () => void;
}
export function useExperiment<T extends string>(
experimentId: string,
userId: string | null,
allocationEngine: AllocationEngine
): UseExperimentResult<T> {
const [variant, setVariant] = useState<T | null>(null);
const [isLoading, setIsLoading] = useState(true);
const trackExposure = useCallback(() => {
if (variant) {
// Emit to event bus/analytics pipeline
window.analytics?.track('experiment_exposed', {
experimentId,
variantId: variant,
userId,
timestamp: Date.now()
});
}
}, [experimentId, variant, userId]);
useEffect(() => {
if (!userId) {
setIsLoading(false);
return;
}
// Simulate async config fetch if needed, or use local cache
const allocatedVariant = allocationEngine.allocate(experimentId, userId);
setVariant(allocatedVariant as T);
setIsLoading(false);
}, [experimentId, userId, allocationEngine]);
return { variant, isLoading, trackExposure };
}
4. SRM Detection Utility
Include a utility to detect Sample Ratio Mismatch during implementation testing. This helps engineers catch routing bugs before launching.
export function detectSRM(
expectedRatio: number[],
observedCounts: number[]
): { isMismatch: boolean; pValue: number } {
// Chi-squared test approximation
const totalObserved = observedCounts.reduce((a, b) => a + b, 0);
const totalExpected = expectedRatio.reduce((a, b) => a + b, 0);
let chiSquare = 0;
for (let i = 0; i < expectedRatio.length; i++) {
const expected = (expectedRatio[i] / totalExpected) * totalObserved;
const observed = observedCounts[i];
chiSquare += Math.pow(observed - expected, 2) / expected;
}
// Simplified p-value lookup (in production, use a statistical library)
const degreesOfFreedom = expectedRatio.length - 1;
const pValue = chiSquare > 3.84 && degreesOfFreedom === 1 ? 0.049 : 0.85;
return {
isMismatch: pValue < 0.05,
pValue
};
}
Pitfall Guide
Common Mistakes and Best Practices
-
Peeking at Results Early
- Mistake: Checking statistical significance repeatedly during the experiment and stopping early when a p-value drops below 0.05. This inflates the false positive rate up to 30%.
- Best Practice: Pre-register the sample size based on power analysis. Use sequential testing methods (e.g., SPRT) if early stopping is required.
-
Ignoring Sample Ratio Mismatch (SRM)
- Mistake: Launching experiments without verifying that the traffic split matches the configuration. SRM indicates a bug in hashing, routing, or client-side filtering.
- Best Practice: Implement automated SRM checks in the CI/CD pipeline and dashboard. Block analysis if SRM is detected.
-
Metric Definition Ambiguity
- Mistake: Defining metrics loosely (e.g., "engagement") without a precise SQL or event definition. Different teams calculate metrics differently, leading to conflicting results.
- Best Practice: Define metrics as code or in a semantic layer. Ensure the metric definition is versioned and immutable during the experiment.
-
Network Effects and Interference
- Mistake: Running experiments on features that affect user interaction (e.g., social feeds, marketplaces) without accounting for interference between treatment and control groups.
- Best Practice: Use cluster-based randomization or graph-based allocation for networked products. Monitor global metrics for spillover effects.
-
Lack of Guardrail Metrics
- Mistake: Optimizing for a primary metric while degrading user experience or system performance.
- Best Practice: Always define guardrail metrics (e.g., latency, error rate, retention). Automatically halt experiments if guardrails are breached.
-
Hardcoding Variants in Business Logic
- Mistake: Using
if (variant === 'B') { ... } scattered across components. This makes cleanup difficult and increases bundle size.
- Best Practice: Use configuration-driven UI rendering. The application should render components based on variant metadata, not hardcoded logic.
-
Inadequate Cleanup Strategy
- Mistake: Leaving experiment code and configuration active after the experiment concludes.
- Best Practice: Automate cleanup. When an experiment is marked "archived," the platform should trigger a PR to remove the configuration and flag the code for removal in the next sprint.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Low Traffic / Critical Flow | Server-Side Allocation | Ensures consistency; prevents client-side re-renders; lower latency impact. | Low infrastructure cost; higher engineering effort for edge routing. |
| High Traffic / UI Polish | Client-Side Allocation | Reduces server load; faster iteration; easier A/B testing of UI components. | Medium CDN cost; potential layout shift risk. |
| Complex Multi-Step Flows | Orchestration Platform | Manages state across steps; handles complex traffic splitting; provides unified analytics. | High platform cost; requires integration effort. |
| Rapid Prototyping | Feature Flag with Randomization | Quick setup; low overhead; suitable for internal testing or low-risk changes. | Low cost; limited statistical rigor. |
Configuration Template
Use this JSON schema to define experiments in your configuration store. This template enforces validation and lifecycle management.
{
"experimentId": "exp-checkout-flow-v2",
"name": "Checkout Flow Optimization",
"status": "running",
"targetId": "user",
"trafficSplit": {
"control": 0.5,
"variant_a": 0.25,
"variant_b": 0.25
},
"metrics": {
"primary": "checkout_completion_rate",
"guardrails": ["p95_latency_ms", "api_error_rate"]
},
"variants": {
"control": { "metadata": { "ui_version": "v1" } },
"variant_a": { "metadata": { "ui_version": "v2", "button_color": "blue" } },
"variant_b": { "metadata": { "ui_version": "v2", "button_color": "green" } }
},
"lifecycle": {
"startDate": "2024-01-15T00:00:00Z",
"endDate": "2024-02-15T00:00:00Z",
"autoArchive": true,
"cleanupTaskId": "JIRA-12345"
}
}
Quick Start Guide
-
Initialize the Engine:
Install the experimentation SDK and initialize the allocation engine with your configuration source.
npm install @codcompass/experiment-sdk
import { ExperimentEngine } from '@codcompass/experiment-sdk';
const engine = new ExperimentEngine({ configSource: 'remote' });
-
Define Your First Experiment:
Create a configuration file matching the template above and deploy it to your configuration store.
-
Wrap Your Component:
Use the provided hook to allocate users and render variants.
const { variant, trackExposure } = useExperiment('exp-checkout-flow-v2', userId, engine);
useEffect(() => { trackExposure(); }, []);
-
Track Conversions:
Emit conversion events when the user completes the target action.
engine.trackConversion('checkout_completion_rate', { value: 1 });
-
Monitor and Analyze:
View real-time allocation, SRM status, and metric trends in your experimentation dashboard. Halt the experiment if guardrails are breached.