ent configuration. We use a hashing algorithm to map users to buckets. MurmurHash3 or xxHash are preferred for their speed and distribution properties.
Architecture Decision: Use a salted hash that includes the experiment key, layer index, and user ID. This ensures that changing traffic percentages or adding new experiments does not re-assign users in unrelated layers.
import { createHash } from 'crypto';
export interface ExperimentContext {
userId: string;
deviceId?: string;
attributes: Record<string, string | number | boolean>;
}
export interface ExperimentDefinition {
key: string;
layer: string;
variants: Variant[];
trafficAllocation: number; // 0 to 10000 (basis points)
hashVersion: number;
}
export interface Variant {
key: string;
weight: number; // 0 to 10000
}
export class AssignmentEngine {
private readonly HASH_MODULO = 10000;
/**
* Computes the variant for a user in a specific experiment.
* Uses a layered hashing strategy to ensure orthogonality.
*/
public evaluate(
context: ExperimentContext,
experiment: ExperimentDefinition
): string {
// 1. Check traffic allocation
const trafficHash = this.hash(
`${experiment.layer}:${context.userId}:${experiment.hashVersion}`
);
const trafficBucket = trafficHash % this.HASH_MODULO;
if (trafficBucket >= experiment.trafficAllocation) {
return 'control'; // User outside traffic
}
// 2. Compute variant assignment using layer-specific salt
// Including layer in salt ensures experiments in different layers
// are statistically independent.
const assignmentInput = `${experiment.layer}:${experiment.key}:${context.userId}:${experiment.hashVersion}`;
const assignmentHash = this.hash(assignmentInput);
const assignmentBucket = assignmentHash % this.HASH_MODULO;
// 3. Map bucket to variant based on weights
let cumulativeWeight = 0;
for (const variant of experiment.variants) {
cumulativeWeight += variant.weight;
if (assignmentBucket < cumulativeWeight) {
return variant.key;
}
}
return 'control';
}
private hash(input: string): number {
// Using xxhash-wasm in production for performance;
// crypto.createHash is synchronous and sufficient for TS example.
return parseInt(
createHash('sha256').update(input).digest('hex').substring(0, 8),
16
);
}
}
2. Configuration Distribution and Caching
Experiments change frequently. Fetching configurations from a database on every request is untenable. The framework must use a local cache with delta updates.
Rationale: A pub/sub model or polling mechanism updates a local in-memory store (e.g., a Map or ConcurrentHashMap). This reduces evaluation latency to CPU-cache levels. The configuration should be versioned to allow atomic updates.
import { EventEmitter } from 'events';
export class ExperimentConfigManager extends EventEmitter {
private experiments: Map<string, ExperimentDefinition> = new Map();
private version: number = 0;
public async sync(configUrl: string): Promise<void> {
// Fetch delta or full config from remote store (Redis/S3/HTTP)
const response = await fetch(configUrl);
const payload = await response.json();
if (payload.version > this.version) {
this.experiments = new Map(
Object.entries(payload.experiments)
);
this.version = payload.version;
this.emit('configUpdate', this.version);
}
}
public getExperiment(key: string): ExperimentDefinition | undefined {
return this.experiments.get(key);
}
}
3. Layering Strategy
To run multiple experiments simultaneously without interference, experiments are assigned to layers. Experiments in the same layer are mutually exclusive; experiments in different layers are orthogonal.
Implementation: The layer index is part of the hash input. If Experiment A is in Layer 1 and Experiment B is in Layer 2, the hash for A uses Layer1 and the hash for B uses Layer2. This guarantees that the assignment to A provides no information about the assignment to B, preserving statistical independence.
4. Analytics Integration
The framework must emit assignment events immediately upon evaluation to ensure the analytics dataset matches the user experience.
export interface AnalyticsClient {
track(event: string, properties: Record<string, any>): void;
}
export class ExperimentClient {
constructor(
private engine: AssignmentEngine,
private configManager: ExperimentConfigManager,
private analytics: AnalyticsClient
) {}
public getVariant(
experimentKey: string,
context: ExperimentContext
): string {
const experiment = this.configManager.getExperiment(experimentKey);
if (!experiment) {
// Fail-open to control in production
return 'control';
}
const variant = this.engine.evaluate(context, experiment);
// Emit assignment event for analysis
this.analytics.track('experiment_viewed', {
experimentKey,
variant,
userId: context.userId,
timestamp: Date.now(),
});
return variant;
}
}
Pitfall Guide
1. Peeking and Early Stopping
Mistake: Checking results daily and stopping the test as soon as p-value < 0.05.
Impact: Inflates false positive rates up to 40%. The p-value is only valid at the pre-calculated sample size.
Best Practice: Use sequential testing methods (e.g., Sequential Probability Ratio Test) or fix the sample size before launching. If monitoring is required, use confidence intervals and adjust alpha spending functions.
2. Hash Bias and Modulo Arithmetic
Mistake: Using hash % N where N is not a power of two, or using a weak hash function.
Impact: Introduces systematic bias, causing uneven traffic distribution between variants.
Best Practice: Use a high-quality hash function like MurmurHash3. If using modulo, ensure the hash output space is significantly larger than N to minimize bias, or use floating-point range mapping from the hash digest.
3. Simpson's Paradox in Aggregation
Mistake: Reporting aggregate conversion rates without segmenting by traffic source or device.
Impact: A variant may appear to win overall but lose in every subgroup due to confounding variables (e.g., more mobile traffic assigned to the losing variant).
Best Practice: Always analyze stratified metrics. Ensure randomization is balanced across key covariates. Implement automated checks for covariate imbalance during test initialization.
4. Layering Collisions
Mistake: Placing correlated experiments in the same layer or failing to update layer assignments when reusing experiment keys.
Impact: Experiments interfere with each other, making it impossible to attribute effects correctly.
Best Practice: Maintain a registry of layers. Document which experiments reside in which layer. When an experiment concludes, archive the key rather than reusing it, or increment the hashVersion to reset assignment.
5. Novelty and Primacy Effects
Mistake: Interpreting short-term lifts caused by user curiosity as permanent value.
Impact: Rolling out a feature that provides no long-term benefit, or worse, annoys users once the novelty wears off.
Best Practice: Run tests for a sufficient duration to capture behavior stabilization (typically 2-4 business cycles). Monitor retention metrics alongside conversion metrics.
6. Metric Volatility and Guardrail Neglect
Mistake: Optimizing for a single metric (e.g., clicks) without monitoring guardrails (e.g., latency, error rate, support tickets).
Impact: A variant may increase clicks by 5% but increase server costs by 20% or degrade accessibility, resulting in a net negative impact.
Best Practice: Define primary, secondary, and guardrail metrics before launch. Implement automated alerts if guardrails breach thresholds.
7. Inconsistent Assignment Across Devices
Mistake: Relying on device IDs or cookies for assignment in a multi-device user journey.
Impact: Users see different variants on mobile vs. desktop, causing confusion and data fragmentation.
Best Practice: Use a persistent, authenticated user ID for assignment. If the user is anonymous, use a stable device fingerprint but upgrade to user ID upon login, accepting the re-assignment cost or using a hybrid key strategy.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Startup / MVP | Third-party SaaS (e.g., LaunchDarkly, Optimizely) | Zero infra overhead, rapid setup, built-in analytics. | High recurring SaaS cost; limits data ownership. |
| High Traffic / Privacy-Sensitive | Server-Side Custom Framework | Full data control, sub-ms latency, compliance with GDPR/CCPA. | Medium dev cost; requires engineering maintenance. |
| Global App with Edge Needs | Hybrid Edge Framework (e.g., Cloudflare Workers) | Lowest latency globally, reduces origin load, consistent UX. | High complexity; edge runtime constraints. |
| Mobile App Offline Mode | Local Evaluation with Sync | Functionality without network; assignment persists offline. | Medium complexity; requires local storage management. |
Configuration Template
# experiments.yaml
version: 142
updated_at: "2024-05-20T10:00:00Z"
experiments:
checkout_button_color:
layer: "ui_optimization"
hash_version: 1
traffic_allocation: 5000 # 50%
targeting:
- attribute: "country"
op: "in"
values: ["US", "CA", "UK"]
variants:
- key: "control"
weight: 5000
payload:
color: "#333333"
- key: "treatment_a"
weight: 5000
payload:
color: "#FF5733"
recommendation_algorithm_v2:
layer: "ml_ranking"
hash_version: 1
traffic_allocation: 2000 # 20%
variants:
- key: "control"
weight: 5000
- key: "treatment_b"
weight: 5000
Quick Start Guide
-
Initialize the Client:
const engine = new AssignmentEngine();
const config = new ExperimentConfigManager();
await config.sync('https://config.store/experiments.yaml');
const client = new ExperimentClient(engine, config, analyticsClient);
-
Evaluate an Experiment:
const context: ExperimentContext = {
userId: 'user_12345',
attributes: { country: 'US', plan: 'pro' }
};
const variant = client.getVariant('checkout_button_color', context);
-
Apply Variant Logic:
const experiment = config.getExperiment('checkout_button_color');
const variantDef = experiment.variants.find(v => v.key === variant);
if (variantDef?.payload?.color) {
renderButton({ color: variantDef.payload.color });
}
-
Track Conversion:
// When user completes checkout
analytics.track('checkout_completed', {
experimentKey: 'checkout_button_color',
variant: variant,
revenue: 49.99
});
-
Verify Assignment:
Check analytics dashboard for experiment_viewed events to confirm traffic split matches traffic_allocation and weights within statistical tolerance.