ining from serving, enforcing temporal data integrity, and implementing continuous validation. The architecture follows an event-driven pipeline: ingestion → feature validation → model serving → drift monitoring → automated retraining.
Step-by-Step Implementation
- Temporal Feature Engineering: Construct lag, rolling, and calendar features without lookahead bias. Use a feature store to version transformations and enforce point-in-time correctness.
- Model Training & Quantile Output: Train an ensemble model that outputs P10, P50, and P90 forecasts. Quantile regression replaces point estimates with probabilistic ranges, enabling risk-aware decision making.
- TypeScript Serving Layer: Deploy a Fastify-based prediction service that validates incoming payloads, fetches latest features, calls the model endpoint, and returns calibrated intervals.
- Drift & Performance Monitoring: Track feature drift (PSI), prediction error (MAPE/WAPE), and calibration decay. Trigger retraining when thresholds are breached.
- Automated Retraining Pipeline: Use an event bus to queue retraining jobs. Version models in a registry, run shadow deployments, and promote only when validation metrics improve.
TypeScript Serving & Monitoring Implementation
// src/server/forecast-service.ts
import Fastify from 'fastify';
import { z } from 'zod';
import { ModelClient } from './model-client';
import { DriftMonitor } from './drift-monitor';
import { ForecastConfig } from '../config/types';
const forecastSchema = z.object({
tenantId: z.string().uuid(),
horizon: z.number().int().min(1).max(90),
granularity: z.enum(['daily', 'weekly', 'monthly']),
includeUncertainty: z.boolean().default(true),
});
export class ForecastService {
private readonly fastify = Fastify({ logger: true });
private readonly modelClient: ModelClient;
private readonly driftMonitor: DriftMonitor;
constructor(private readonly config: ForecastConfig) {
this.modelClient = new ModelClient(config.modelEndpoint);
this.driftMonitor = new DriftMonitor(config.monitoring);
}
async initialize() {
this.fastify.post('/v1/forecast', async (request, reply) => {
const payload = forecastSchema.parse(request.body);
const features = await this.modelClient.fetchFeatures(payload.tenantId, payload.granularity);
const validation = this.driftMonitor.validateFeatureDistribution(features);
if (validation.driftDetected) {
this.driftMonitor.queueRetraining(payload.tenantId);
this.fastify.log.warn(`Drift detected for ${payload.tenantId}. Shadow mode active.`);
}
const forecast = await this.modelClient.predict({
features,
horizon: payload.horizon,
quantiles: payload.includeUncertainty ? [0.1, 0.5, 0.9] : [0.5],
});
return reply.status(200).send({
tenantId: payload.tenantId,
predictions: forecast,
metadata: {
modelVersion: this.config.modelVersion,
calibrationStatus: validation.calibrationScore,
driftAlert: validation.driftDetected,
},
});
});
await this.fastify.listen({ port: Number(this.config.port), host: '0.0.0.0' });
}
}
// src/monitoring/drift-monitor.ts
import { PopulationStabilityIndex } from '../utils/statistics';
import { ForecastConfig } from '../config/types';
export class DriftMonitor {
constructor(private readonly config: ForecastConfig['monitoring']) {}
validateFeatureDistribution(features: Record<string, number>[]) {
const psi = PopulationStabilityIndex.calculate(
features.map(f => f.revenue_lag_7d),
this.config.baselineDistribution
);
const driftDetected = psi > this.config.psiThreshold;
const calibrationScore = this.assessCalibration(features);
return { driftDetected, calibrationScore, psi };
}
queueRetraining(tenantId: string) {
// Push to Redis stream / SQS / Kafka for async training pipeline
console.log(`[RETRAIN] Queued for ${tenantId}`);
}
private assessCalibration(features: Record<string, number>[]): number {
// Simplified Brier score approximation for production monitoring
const observedVsPredicted = features.map(f => ({
observed: f.actual_revenue,
predicted: f.p50_forecast,
}));
const brier = observedVsPredicted.reduce((sum, o) =>
sum + Math.pow(o.observed - o.predicted, 2), 0
) / observedVsPredicted.length;
return Math.min(1, brier / 1000); // Normalized 0-1
}
}
Architecture Decisions & Rationale
- Separation of Training & Serving: Python handles model training, GPU acceleration, and PyTorch/XGBoost optimizations. TypeScript manages low-latency serving, feature validation, and monitoring hooks. This prevents model dependencies from bloating the API runtime.
- Quantile Regression over Point Forecasts: Business decisions require risk bounds. P10/P90 intervals enable inventory provisioning, cash reserve planning, and SLA negotiation.
- Event-Driven Retraining: Batch retraining on fixed schedules ignores regime shifts. Event-driven triggers (drift PSI > threshold, MAPE degradation, campaign launch) ensure models adapt without unnecessary compute spend.
- Feature Store Integration: Point-in-time correctness prevents leakage. Versioned features guarantee reproducibility and enable rollback when new transformations degrade performance.
Pitfall Guide
- Lookahead Bias in Feature Engineering: Using future revenue to calculate rolling averages or lag features corrupts training data. Temporal joins must enforce strict cutoff dates.
- Ignoring Prediction Intervals: Point forecasts create false confidence. Without quantile outputs, teams cannot size risk buffers or allocate capital efficiently.
- Static Feature Sets Without Versioning: Adding a new marketing channel or pricing tier without feature versioning breaks reproducibility. Unversioned features make debugging forecast failures impossible.
- Treating Drift Detection as Optional: Model decay is inevitable. Without PSI tracking and calibration monitoring, forecasts drift silently until financial variance reports trigger emergency fixes.
- Overfitting to Recent Volatility: Retraining exclusively on the last 30 days causes models to chase noise. Rolling window training with exponential decay or seasonality-aware sampling prevents volatility lock-in.
- Missing Cross-Channel Attribution: Revenue forecasting isolated from marketing spend, pricing changes, or competitor activity produces blind forecasts. Multi-modal feature ingestion is non-negotiable.
- No Shadow Deployment or Rollback: Promoting models directly to production without A/B validation or canary routing risks catastrophic variance. Automated rollback on MAPE degradation is mandatory.
Best Practices from Production:
- Enforce temporal cross-validation (expanding window) instead of random train/test splits.
- Store features in a versioned store with point-in-time join capabilities.
- Implement quantile regression for probabilistic outputs.
- Monitor calibration decay alongside accuracy metrics.
- Route 10% of traffic to shadow models before promotion.
- Log all feature snapshots with predictions for forensic analysis.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Early-stage SaaS (<$5M ARR) | Ensemble ML + Batch Serving | Low infrastructure overhead, fast iteration, sufficient accuracy for runway planning | $120-200/mo |
| High-volume E-commerce | Transformer (TFT) + Streaming Inference | Handles non-stationary demand, campaign spikes, and multi-sku attribution with low latency | $380-550/mo |
| Enterprise with Legacy Data | Hybrid (Prophet baseline + XGBoost residual correction) | Stabilizes noisy historical data while capturing non-linear patterns without full GPU dependency | $250-400/mo |
Configuration Template
# forecast-pipeline.config.yaml
pipeline:
version: "2.1.0"
environment: "production"
data_ingestion:
sources:
- type: "warehouse"
table: "analytics.revenue_transactions"
partition_key: "event_date"
- type: "api"
endpoint: "https://ads-platform.internal/v1/campaign_spend"
refresh_interval: "6h"
feature_store:
versioning: true
point_in_time_correct: true
retention_days: 365
model:
architecture: "xgboost_quantile"
quantiles: [0.1, 0.5, 0.9]
training_window: "180d"
retrain_trigger:
psi_threshold: 0.25
mape_threshold: 15.0
manual_override: true
serving:
framework: "fastify"
port: 8080
concurrency: 200
timeout_ms: 1500
monitoring:
drift:
metric: "psi"
threshold: 0.25
baseline_refresh: "7d"
performance:
metric: "wape"
threshold: 12.0
evaluation_window: "30d"
alerting:
channels: ["slack", "pagerduty"]
severity_map:
psi_breach: "warning"
calibration_decay: "critical"
Quick Start Guide
- Initialize the repository: Clone the forecasting service template and install dependencies (
npm install).
- Configure environment variables: Set
MODEL_ENDPOINT, FEATURE_STORE_URL, and monitoring thresholds in .env.
- Run the feature validation script:
npm run validate-features to verify temporal joins and PSI baselines against your warehouse.
- Start the serving layer:
npm start launches the Fastify server on port 8080. Test with curl -X POST http://localhost:8080/v1/forecast -H "Content-Type: application/json" -d '{"tenantId":"550e8400-e29b-41d4-a716-446655440000","horizon":30,"granularity":"daily"}'.
- Verify monitoring hooks: Check logs for drift validation responses and confirm shadow model routing is active before promoting to full traffic.