number;
close: number;
volume: number;
}
interface StrategyContext {
// Only contains bars where bar.timestamp < currentDecisionTime
confirmedHistory: MarketBar[];
// Pre-computed rolling statistics to prevent leakage
rollingStats: RollingStatistics;
}
// The causal execution loop
class CausalBacktestEngine {
private dataFeed: DataFeed;
private strategy: Strategy;
constructor(feed: DataFeed, strat: Strategy) {
this.dataFeed = feed;
this.strategy = strat;
}
run(): BacktestResult {
const results: Trade[] = [];
let portfolio = new Portfolio();
for (let t = 1; t < this.dataFeed.length; t++) {
// CRITICAL: Slice data up to t-1.
// Bar at index t is the "future" relative to decision at t.
const decisionTime = this.dataFeed[t].timestamp;
const availableData = this.dataFeed.slice(0, t);
const context: StrategyContext = {
confirmedHistory: availableData,
rollingStats: this.calculateRollingStats(availableData)
};
// Strategy generates signal based ONLY on confirmed data
const signal = this.strategy.evaluate(context);
if (signal.action === 'ENTER') {
// Execution cannot happen at t.
// Must model latency or next-bar open.
const fillPrice = this.dataFeed[t].open;
const trade = portfolio.execute(signal, fillPrice, decisionTime);
results.push(trade);
}
}
return this.analyze(results, portfolio);
}
}
#### 2. Execution Modeling and Fill Logic
Even with strict data slicing, execution assumptions can introduce bias. If a strategy detects a breakout using the high of a bar, filling at that same high is impossible because the high is only known after the price has reached it.
* **Rationale:** Use `Next-Bar Open` fills for market orders to simulate the delay between signal generation and order placement. For limit orders, model partial fills and slippage based on order book depth or historical volatility.
* **Implementation:** The `execute` method in the engine should enforce a minimum delay. If the strategy signals at `t`, the earliest fill is `t+1` (next bar open) or a simulated limit fill within `t+1` with probability constraints.
#### 3. Point-in-Time Feature Engineering
Machine learning strategies are particularly vulnerable to leakage during feature normalization. Computing global statistics (mean, standard deviation) over the entire dataset injects future information into every feature vector.
* **Rationale:** Features must be normalized using statistics available up to the current timestamp. This requires rolling or expanding window calculations.
* **Implementation:** Use incremental algorithms for statistics.
```typescript
class IncrementalZScore {
private count: number = 0;
private mean: number = 0;
private m2: number = 0;
update(value: number): number {
this.count++;
const delta = value - this.mean;
this.mean += delta / this.count;
const delta2 = value - this.mean;
this.m2 += delta * delta2;
const variance = this.m2 / (this.count - 1);
const stdDev = Math.sqrt(variance);
// Returns normalized value based ONLY on data seen so far
return (value - this.mean) / stdDev;
}
}
Pitfall Guide
Even with a robust engine, specific implementation patterns can reintroduce look-ahead bias. The following pitfalls represent the most common vectors for leakage in production systems.
| Pitfall Name | Explanation | Fix |
|---|
| The Forming Bar Trap | Accessing close, high, or low of the bar currently being built. The strategy acts on a value that changes until the bar closes. | Enforce barIndex - 1 access. The engine should throw an error if the strategy requests data from the current timestamp. |
| Intra-Bar Fill Assumption | Entering a trade at the high/low of the breakout bar. The signal requires the price to reach that level, but the fill assumes execution at that exact level without slippage. | Use next-bar open for market entries. For limit entries, model fill probability based on price distribution within the bar. |
| Repainting Indicator Mirage | Using indicators like ZigZag, pivots, or certain oscillators that recalculate past values based on future data. The backtest sees the "final" shape, not the real-time evolution. | Replace with non-repainting equivalents. If repainting is unavoidable, delay signals until the indicator value stabilizes (e.g., wait for bar close confirmation). |
| Global Statistic Contamination | Normalizing features using the mean/std of the entire dataset. This centers data around future knowledge, making patterns appear more separable than they are. | Implement rolling window normalization or incremental statistics (Welford's algorithm) that update only with incoming data. |
| Survivorship Bias in Universe | Backtesting on a list of assets that exist today. This excludes delisted, bankrupt, or acquired companies, skewing returns upward. | Use point-in-time (PIT) universe data. The asset list at timestamp t must match the list available to traders at t. |
| Optimization Overfitting | Selecting parameters based on the best performance across the entire historical period. This fits noise rather than signal. | Use Walk-Forward Optimization (WFO). Optimize on a window, validate on the subsequent unseen window, and roll forward. Report only out-of-sample results. |
| Latency Ignorance | Assuming zero latency between signal generation and order submission. In reality, network delays and processing time mean the market moves before the order arrives. | Inject artificial latency in the simulation. Delay signal processing by N milliseconds or bars to model realistic infrastructure constraints. |
Production Bundle
This section provides actionable artifacts to implement causal integrity in your backtesting workflow.
Action Checklist
Decision Matrix
Use this matrix to select the appropriate causal modeling approach based on strategy characteristics and data availability.
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-Frequency Strategy | Next-bar open fill + explicit latency injection | HFS is sensitive to microstructure; next-bar open is conservative but safe. Latency models capture execution risk. | High (Requires low-latency infrastructure) |
| Swing Trading | Rolling window normalization + WFO | Swing strategies benefit from robust feature scaling. WFO ensures parameters adapt to regime changes. | Medium (Computational cost of WFO) |
| ML Classification | Incremental scaler + PIT universe | ML models are highly sensitive to leakage. Incremental scaling prevents future info. PIT universe prevents survivorship bias. | High (Data costs for PIT, compute for incremental updates) |
| Mean Reversion | Limit order model + slippage sensitivity | Mean reversion often uses limit orders. Modeling fill probability is critical to avoid overestimating entry quality. | Low (Simulation complexity only) |
Configuration Template
This TypeScript configuration object defines the causal constraints for a backtesting run. Integrate this into your engine initialization to enforce strict mode.
interface CausalConfig {
causality: {
// Enforce strict t-1 data access. Throws error if strategy accesses current bar.
strictMode: boolean;
// Minimum delay in bars between signal and fill.
minFillDelayBars: number;
};
execution: {
// Fill model: 'nextOpen', 'limit', or 'vwap'.
fillModel: 'nextOpen' | 'limit' | 'vwap';
// Slippage in basis points applied to fills.
slippageBps: number;
// Latency in milliseconds injected before order processing.
latencyMs: number;
};
data: {
// Universe type: 'current' (risky) or 'pointInTime' (safe).
universeType: 'current' | 'pointInTime';
// Normalization: 'global' (risky) or 'rolling' (safe).
normalization: 'global' | 'rolling';
// Rolling window size for statistics.
rollingWindowSize: number;
};
validation: {
// Enable Walk-Forward Optimization.
walkForward: boolean;
// Optimization window size in bars.
optWindowSize: number;
// Validation window size in bars.
valWindowSize: number;
};
}
const strictConfig: CausalConfig = {
causality: { strictMode: true, minFillDelayBars: 1 },
execution: { fillModel: 'nextOpen', slippageBps: 5, latencyMs: 50 },
data: { universeType: 'pointInTime', normalization: 'rolling', rollingWindowSize: 200 },
validation: { walkForward: true, optWindowSize: 1000, valWindowSize: 250 }
};
Quick Start Guide
Follow these steps to establish a causally strict backtesting environment in under five minutes.
- Initialize Engine with Strict Config: Create your backtesting instance using the
strictConfig template above. Ensure strictMode is enabled to prevent accidental data access violations.
- Load Point-in-Time Data: Ingest historical data that includes a
universe column or separate PIT universe feed. Verify that delisted assets are excluded from historical windows where they did not exist.
- Implement Strategy with Delayed Access: Write your strategy logic using the
StrategyContext. Access data via context.confirmedHistory[context.confirmedHistory.length - 1] to guarantee you are using the last closed bar.
- Run Validation Suite: Execute the backtest with Walk-Forward Optimization enabled. Review the out-of-sample Sharpe ratio and drawdown. If the delta between in-sample and out-of-sample metrics exceeds 20%, investigate parameter sensitivity or feature leakage.
- Deploy to Paper Trading: Once the causal backtest shows stable out-of-sample performance, deploy to a paper trading environment. Compare live execution metrics against the backtest to validate the execution model assumptions.