tional routing, production gateways must handle header preservation, connection pooling, and graceful degradation.
import { createServer } from 'http';
import { proxy } from 'http-proxy-middleware';
const LEGACY_UPSTREAM = 'http://legacy-core.internal:8080';
const MODERN_UPSTREAM = 'http://routing-engine.internal:3000';
interface TrafficPolicy {
legacyWeight: number;
modernWeight: number;
shadowMode: boolean;
featureToggle: boolean;
circuitBreakerThreshold: number;
}
const trafficConfig: TrafficPolicy = {
legacyWeight: 0.95,
modernWeight: 0.05,
shadowMode: true,
featureToggle: process.env.ENABLE_NEW_ROUTING === 'true',
circuitBreakerThreshold: 0.15
};
function resolveTarget(req: any): string {
if (!trafficConfig.featureToggle) return LEGACY_UPSTREAM;
const random = Math.random();
return random < trafficConfig.modernWeight
? MODERN_UPSTREAM
: LEGACY_UPSTREAM;
}
const server = createServer((req, res) => {
const target = resolveTarget(req);
proxy({
target,
changeOrigin: true,
onProxyReq: (proxyReq, req) => {
proxyReq.setHeader('X-Migration-Target', target === MODERN_UPSTREAM ? 'modern' : 'legacy');
proxyReq.setHeader('X-Trace-Id', req.headers['x-trace-id'] || crypto.randomUUID());
},
onProxyRes: (proxyRes, req, res) => {
if (trafficConfig.shadowMode && target === MODERN_UPSTREAM) {
// Fire-and-forget shadow request to legacy for comparison
fetch(`${LEGACY_UPSTREAM}${req.url}`, {
method: req.method,
body: req.body,
headers: { 'X-Shadow-Request': 'true' }
}).catch(() => {});
}
},
onError: (err, req, res) => {
console.error(`Proxy failure targeting ${target}:`, err.message);
res.writeHead(502, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'upstream_unavailable' }));
}
})(req, res);
});
server.listen(4000, () => console.log('Migration gateway running on :4000'));
Step 2: Implement Shadow Validation
Shadow testing duplicates incoming requests to both systems without returning the modern service’s response to the client. This allows direct comparison of outputs, latency, and error rates under identical conditions. The validation layer must handle payload normalization, timeout management, and statistical aggregation.
interface ValidationResult {
latencyDelta: number;
payloadMatch: boolean;
statusCodes: { legacy: number; modern: number };
traceId: string;
}
async function compareResponses(legacyRes: Response, modernRes: Response, traceId: string): Promise<ValidationResult> {
const [legacyData, modernData] = await Promise.all([
legacyRes.json(),
modernRes.json()
]);
const legacyLatency = Number(legacyRes.headers.get('X-Response-Time') || 0);
const modernLatency = Number(modernRes.headers.get('X-Response-Time') || 0);
return {
latencyDelta: modernLatency - legacyLatency,
payloadMatch: JSON.stringify(legacyData) === JSON.stringify(modernData),
statusCodes: {
legacy: legacyRes.status,
modern: modernRes.status
},
traceId
};
}
Step 3: Gradual Traffic Ramping
Traffic distribution should follow a controlled schedule. Start at 1-5%, monitor error budgets and latency percentiles, then incrementally increase. The routing configuration should be externalized to a configuration service or feature flag provider to enable real-time adjustments without redeployment. Implement circuit breakers to automatically revert traffic if the modern service exceeds failure thresholds.
Architecture Rationale
- Proxy Layer Separation: Decouples routing logic from business code. Allows independent scaling and configuration updates without touching application logic.
- Hexagonal Architecture for New Service: Enforces strict boundary separation between core domain logic and external adapters (databases, APIs). This makes the new service portable and testable without legacy dependencies.
- Shadow Testing First: Validates behavioral parity before exposing users to the new system. Prevents silent data corruption or logic drift.
- Externalized Traffic Weights: Enables operations teams to adjust risk exposure dynamically based on real-time telemetry. Supports automated canary analysis pipelines.
- Circuit Breaking Integration: Protects against cascading failures during ramp-up. Automatically shifts traffic back to legacy if error rates breach defined SLOs.
Pitfall Guide
-
Shared Database Coupling
Explanation: Both legacy and modern services read/write to the same tables, causing transaction conflicts, schema migration deadlocks, and lock contention.
Fix: Implement the Database-per-Service pattern during extraction. Use event-driven synchronization or dual-write adapters with idempotency keys until the legacy system is fully decommissioned.
-
Ignoring Idempotency & State Divergence
Explanation: Network retries or traffic shifts cause duplicate requests. Without idempotency, financial or inventory operations execute multiple times, corrupting state.
Fix: Enforce idempotency keys on all mutating endpoints. Implement distributed locks or optimistic concurrency control in the modern service. Validate request signatures before processing.
-
Feature Flag Sprawl
Explanation: Accumulating dozens of migration flags creates configuration drift, increases cognitive load, and complicates rollback procedures.
Fix: Treat flags as temporary infrastructure. Set expiration dates, automate cleanup via CI/CD pipelines, and consolidate routing logic into a single configuration source. Implement flag lifecycle management.
-
Skipping Shadow Validation
Explanation: Direct traffic cutover without parallel testing masks behavioral differences until users report issues. Silent failures are harder to diagnose post-cutover.
Fix: Mandate a minimum 7-day shadow phase. Compare response payloads, status codes, and latency distributions. Only proceed when divergence falls below a defined threshold (typically <2%).
-
Aggressive Traffic Ramping
Explanation: Jumping from 5% to 50% traffic overwhelms the new service’s capacity planning and hides performance bottlenecks.
Fix: Follow a logarithmic ramp schedule (1% → 5% → 15% → 30% → 60% → 100%). Validate at each step against SLOs before proceeding. Use automated canary analysis to gate progression.
-
Inadequate Observability During Transition
Explanation: Legacy and modern systems use different logging formats, trace IDs, or metric schemas, making cross-system debugging impossible.
Fix: Standardize correlation IDs at the proxy layer. Implement unified tracing (OpenTelemetry) and ensure both services emit compatible metrics to a single dashboard. Alert on cross-service latency spikes.
-
Treating the Proxy as Permanent
Explanation: The routing layer accumulates business logic over time, becoming a new monolith that defeats the purpose of migration.
Fix: Define a strict contract for the proxy: routing, traffic splitting, and header injection only. Move all business logic into the modern service. Plan proxy decommissioning alongside legacy retirement.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Monolith with unclear business rules | Strangler Fig | Preserves live validation while extracting domains | Low upfront, predictable operational cost |
| Compliance-heavy system with strict audit trails | Parallel Change + Shadow Testing | Ensures regulatory parity before cutover | Medium (dual infrastructure during transition) |
| Greenfield product with no legacy debt | Direct Deployment | No migration overhead required | Minimal |
| Legacy system with tightly coupled shared DB | Database-per-Service + Event Sync | Prevents transaction conflicts and schema lock | High initial engineering, low long-term risk |
Configuration Template
# migration-gateway-config.yaml
routing:
proxy_port: 4000
legacy_upstream: "http://legacy-core.internal:8080"
modern_upstream: "http://routing-engine.internal:3000"
traffic_control:
feature_flag: "ENABLE_NEW_ROUTING"
initial_weight: 0.05
ramp_schedule:
- week: 1
weight: 0.05
- week: 3
weight: 0.15
- week: 5
weight: 0.30
- week: 7
weight: 0.60
- week: 9
weight: 1.00
validation:
shadow_mode: true
comparison_timeout_ms: 2000
divergence_threshold: 0.02
alert_on_status_mismatch: true
observability:
correlation_header: "X-Trace-Id"
metrics_endpoint: "/metrics"
log_format: "json"
circuit_breaker:
failure_threshold: 0.15
recovery_timeout_sec: 60
Quick Start Guide
- Identify Extraction Boundary: Select a single API route or domain with minimal cross-dependencies. Document its current behavior, latency baseline, and error rate. Isolate it from shared state where possible.
- Deploy Routing Proxy: Initialize the gateway with the configuration template. Set initial traffic weight to 5% and enable shadow mode. Verify header injection, trace propagation, and upstream connectivity.
- Run Shadow Validation: Route production traffic through the proxy for 48-72 hours. Compare legacy and modern responses using the validation pipeline. Resolve any payload mismatches, timeout violations, or latency regressions.
- Begin Traffic Ramp: Adjust traffic weights according to the schedule. Monitor error budgets, p95 latency, and system resource utilization. Roll back immediately if thresholds are breached. Automate progression using canary analysis tools.
- Iterate & Decommission: Repeat extraction for additional domains. Once the modern service handles 100% of traffic and passes stability checks, retire the legacy system, remove the proxy layer, and archive migration artifacts.