oning, establish an IaC baseline for the target environment. Import existing legacy infrastructure where possible to create a unified state file. This prevents configuration drift and enables diff-based validation.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Define migration project configuration
const config = new pulumi.Config();
const migrationEnv = config.require("environment");
const legacySubnetIds = config.requireObject<string[]>("legacySubnetIds");
// Create target VPC with strict isolation
const targetVpc = new aws.ec2.Vpc("migration-vpc", {
cidrBlock: "10.0.0.0/16",
enableDnsHostnames: true,
enableDnsSupport: true,
tags: {
Project: "CloudMigration",
Environment: migrationEnv,
ManagedBy: "Pulumi"
}
});
// Import legacy subnets to track state
const importedSubnets = legacySubnetIds.map((id, index) =>
aws.ec2.Subnet.get(`legacy-subnet-${index}`, id)
);
export const vpcId = targetVpc.id;
Step 2: Strangler Proxy Implementation
Deploy a reverse proxy or API Gateway to manage traffic routing between legacy and cloud-native endpoints. This component enables feature-flag-style cutover based on URL paths, headers, or percentage-based traffic splitting.
import * as aws from "@pulumi/aws";
import * as pulumi from "@pulumi/pulumi";
// Define target and legacy endpoints
const legacyEndpoint = config.require("legacyEndpointUrl");
const cloudEndpoint = config.require("cloudEndpointUrl");
// ALB for traffic management
const alb = new aws.lb.LoadBalancer("strangler-alb", {
internal: false,
loadBalancerType: "application",
securityGroups: [securityGroup.id],
subnets: publicSubnetIds,
tags: { Name: "strangler-proxy" }
});
// Target Groups for dual-run
const legacyTg = new aws.lb.TargetGroup("legacy-tg", {
port: 443,
protocol: "HTTPS",
targetType: "ip",
vpcId: targetVpc.id,
healthCheck: {
enabled: true,
path: "/health",
matcher: "200"
}
});
const cloudTg = new aws.lb.TargetGroup("cloud-tg", {
port: 8080,
protocol: "HTTP",
targetType: "instance",
vpcId: targetVpc.id,
healthCheck: {
enabled: true,
path: "/api/health",
matcher: "200"
}
});
// Listener rule with weighted forwarding
const listener = new aws.lb.Listener("strangler-listener", {
loadBalancerArn: alb.arn,
port: 443,
protocol: "HTTPS",
sslPolicy: "ELBSecurityPolicy-TLS-1-2-2017-01",
certificateArn: certificateArn,
defaultActions: [{
type: "forward",
targetGroupArn: legacyTg.arn,
}]
});
// Incremental cutover rule
const cutoverRule = new aws.lb.ListenerRule("api-cutover", {
listenerArn: listener.arn,
priority: 100,
actions: [{
type: "forward",
forward: {
targetGroups: [
{
targetGroupArn: cloudTg.arn,
weight: 10, // Start with 10% traffic
},
{
targetGroupArn: legacyTg.arn,
weight: 90,
}
]
}
}],
conditions: [{
pathPattern: {
values: ["/api/v2/*"]
}
}]
});
Step 3: Data Migration and Consistency
Stateful migrations require Change Data Capture (CDC) to maintain consistency during the dual-write phase. Deploy a CDC pipeline using tools like Debezium or AWS DMS to replicate data from legacy databases to cloud-native stores.
Architecture Decision: Use bidirectional replication only when immediate rollback is required. Unidirectional replication reduces conflict resolution complexity. Implement application-level dual-writes for critical transactions where CDC latency exceeds SLA thresholds.
// Example DMS Replication Instance Configuration
const replicationInstance = new aws.dms.ReplicationInstance("cdc-instance", {
replicationInstanceClass: "dms.c4.xlarge",
allocatedStorage: 100,
multiAz: true,
publiclyAccessible: false,
vpcSecurityGroupIds: [dmsSecurityGroup.id],
replicationSubnetGroupId: subnetGroup.id,
tags: { Name: "migration-cdc" }
});
const endpointLegacy = new aws.dms.Endpoint("legacy-db-endpoint", {
endpointType: "source",
engine: "mysql",
username: config.require("dbUser"),
password: config.requireSecret("dbPassword"),
serverName: config.require("legacyDbHost"),
port: 3306,
});
const endpointCloud = new aws.dms.Endpoint("cloud-db-endpoint", {
endpointType: "target",
engine: "aurora-mysql",
username: config.require("dbUser"),
password: config.requireSecret("dbPassword"),
serverName: cloudDbCluster.endpoint,
port: 3306,
});
Step 4: Cutover and Validation
Execute cutover using automated canary analysis. Monitor error rates, latency, and business metrics. If thresholds are breached, the Strangler proxy automatically reverts traffic to the legacy environment.
- Freeze Schema Changes: Prevent DDL operations on legacy databases during final sync.
- Validate Lag: Ensure replication lag is near zero.
- Switch DNS/Proxy: Update weights in the listener rule to 100% cloud.
- Decommission: Terminate legacy resources after validation period.
Pitfall Guide
1. Egress Cost Blindness
Migrating large datasets between availability zones or regions incurs significant egress fees. Teams often miscalculate costs by ignoring cross-AZ traffic for database replication and backup operations.
- Mitigation: Co-locate compute and storage within the same AZ where possible. Use VPC endpoints for AWS services to avoid NAT gateway charges. Implement traffic shaping and compression for replication streams.
2. DNS TTL Latency
During cutover, DNS propagation delays can cause traffic to route to decommissioned endpoints, resulting in 503 errors.
- Mitigation: Reduce TTL values to 60 seconds at least 24 hours before migration. Use weighted routing in load balancers rather than relying solely on DNS for traffic shifting. Implement client-side retry logic with exponential backoff.
3. Stateful Coupling in Stateless Migrations
Rehosting stateful components (sessions, caches) without externalizing state leads to data loss during scaling events.
- Mitigation: Externalize session state to managed Redis or DynamoDB. Ensure all microservices are designed with stateless compute. Validate session affinity requirements and remove them where possible.
4. IAM Drift and Privilege Escalation
Migration scripts often run with broad permissions, creating security vulnerabilities that persist post-migration.
- Mitigation: Apply least-privilege IAM roles to migration agents. Use Pulumi or Terraform to enforce IAM policies as code. Implement automated drift detection to identify unauthorized permission changes.
5. Data Consistency Failures
Relying solely on batch migration for active databases causes data loss during the cutover window.
- Mitigation: Implement CDC for continuous replication. Validate data integrity using checksums and row counts. Perform dry-run migrations to measure replication lag and identify blocking transactions.
6. Network Latency and TCP Optimization
Legacy applications tuned for low-latency datacenter networks may experience performance degradation due to cloud network characteristics.
- Mitigation: Enable TCP window scaling and jumbo frames where supported. Place dependent services in the same region. Implement connection pooling and keep-alive settings. Profile application network I/O before migration.
7. Lack of Automated Rollback
Manual rollback procedures are slow and error-prone, extending downtime during failures.
- Mitigation: Automate rollback via IaC state management. Implement blue/green deployment patterns. Define clear rollback triggers based on monitoring metrics. Test rollback procedures in staging environments.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Legacy Monolith with Tight Deadline | Rehost with Strangler | Rapid migration while enabling incremental modernization | Medium initial, High long-term if not refactored |
| Stateless API Service | Refactor to Serverless | Eliminates infrastructure management and scales automatically | Low TCO, High initial dev effort |
| High-Volume Database | Replatform to Managed Service | Reduces operational overhead and improves availability | Medium cost, High reliability gain |
| Real-time Stream Processing | Refactor to Cloud-Native Streams | Leverages native scaling and integration capabilities | Medium cost, High performance |
| Compliance-Heavy Workload | Rehost with Strict IaC | Maintains control over data locality and security posture | High compliance cost, Low risk |
Configuration Template
Pulumi TypeScript Migration Stack Template
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const config = new pulumi.Config();
const env = config.get("environment") || "dev";
// Migration Landing Zone
const vpc = new aws.ec2.Vpc("migration-lz", {
cidrBlock: "10.0.0.0/16",
tags: { Environment: env, Phase: "Migration" }
});
// Dual-Run Target Groups
const legacyTg = new aws.lb.TargetGroup("legacy", {
port: 80, protocol: "HTTP", vpcId: vpc.id, targetType: "ip"
});
const cloudTg = new aws.lb.TargetGroup("cloud", {
port: 8080, protocol: "HTTP", vpcId: vpc.id, targetType: "instance"
});
// Traffic Splitting Configuration
const splitConfig = {
legacyWeight: config.getNumber("legacyWeight") || 90,
cloudWeight: config.getNumber("cloudWeight") || 10,
};
// Output for CI/CD integration
export const albDnsName = alb.dnsName;
export const migrationPhase = pulumi.interpolate`
Phase: ${env}
Traffic Split: Legacy ${splitConfig.legacyWeight}% / Cloud ${splitConfig.cloudWeight}%
`;
Quick Start Guide
- Initialize IaC Project: Run
pulumi new aws-typescript to create a migration project with TypeScript support.
- Configure Environment: Set required configuration values using
pulumi config set for legacy endpoints, credentials, and traffic weights.
- Deploy Sandbox: Execute
pulumi up to provision the migration landing zone, VPC, and Strangler proxy in a isolated environment.
- Validate Connectivity: Test routing rules by sending requests to the ALB DNS name and verifying traffic distribution.
- Integrate CDC: Deploy database replication instances and configure endpoints to begin data synchronization.
- Monitor Metrics: Configure CloudWatch alarms for latency, error rates, and replication lag to trigger automated alerts.