ovisioned instances. Instances averaging below 10β15% CPU utilization are prime candidates for downsizing. Committing to an oversized instance type multiplies waste across the entire commitment term.
Phase 2: Processor Architecture Alignment
AWS instance naming conventions encode processor architecture. The suffix determines the underlying silicon:
i or no suffix: Intel Xeon
a: AMD EPYC
g: AWS Graviton (ARM64)
For stateless Linux workloads, API gateways, containerized services, and web servers, Graviton instances deliver equivalent or superior performance at 10β20% lower cost. Migration typically requires zero application code changes, provided dependencies are compiled for ARM64 or run via container images with multi-architecture manifests. Validate compatibility using AWS-provided Graviton readiness scanners, then shift eligible workloads to g-suffixed instances before purchasing commitments.
Phase 3: Commitment Coverage Modeling
Commitments are not refunds. Unused hourly spend commitments are billed in full. The optimal strategy targets 80β85% coverage of baseline compute, reserving 15β20% for On-Demand to absorb traffic spikes and architectural shifts. Over-committing creates financial drag that is harder to reverse than under-committing. Compute Savings Plans are preferred over Reserved Instances for most fleets because they apply across instance families, sizes, operating systems, and regions, preserving architectural flexibility as workloads evolve.
Phase 4: Automated Fleet Configuration
Manual commitment reviews operate on monthly or quarterly cycles, which is misaligned with cloud-native scaling patterns. Automate instance selection and commitment tracking using infrastructure-as-code and custom metrics pipelines. The following TypeScript module demonstrates a programmatic approach to defining a mixed-instance Auto Scaling Group, calculating coverage targets, and enforcing Graviton-first selection policies.
import { EC2Client, DescribeInstanceTypesCommand } from "@aws-sdk/client-ec2";
import { AutoScalingClient, CreateAutoScalingGroupCommand } from "@aws-sdk/client-auto-scaling";
interface FleetConfig {
workloadName: string;
baselineHourlySpend: number;
targetCoveragePercent: number;
allowedFamilies: string[];
architecturePreference: "graviton" | "intel" | "amd";
spotAllocationStrategy: "lowest-price" | "capacity-optimized";
}
class ComputeFleetOptimizer {
private ec2Client: EC2Client;
private asgClient: AutoScalingClient;
constructor(region: string) {
this.ec2Client = new EC2Client({ region });
this.asgClient = new AutoScalingClient({ region });
}
async calculateCommitmentTarget(config: FleetConfig): Promise<number> {
const coverage = config.targetCoveragePercent / 100;
return config.baselineHourlySpend * coverage;
}
async resolveInstanceOptions(config: FleetConfig): Promise<string[]> {
const command = new DescribeInstanceTypesCommand({
Filters: [
{ Name: "instance-type", Values: config.allowedFamilies },
{ Name: "processor-info.supported-architecture", Values: ["arm64", "x86_64"] }
]
});
const response = await this.ec2Client.send(command);
const instances = response.InstanceTypes ?? [];
return instances
.filter(inst => {
const arch = inst.ProcessorInfo?.SupportedArchitectures?.[0] ?? "";
if (config.architecturePreference === "graviton") return arch === "arm64";
return true;
})
.map(inst => inst.InstanceType ?? "")
.filter(Boolean);
}
async deployMixedInstanceAsg(config: FleetConfig, launchTemplateId: string): Promise<void> {
const instanceOptions = await this.resolveInstanceOptions(config);
const onDemandBase = Math.ceil(config.targetCoveragePercent * 0.1);
const command = new CreateAutoScalingGroupCommand({
AutoScalingGroupName: `${config.workloadName}-fleet`,
LaunchTemplate: { LaunchTemplateId: launchTemplateId },
MinSize: 2,
MaxSize: 50,
DesiredCapacity: 5,
MixedInstancesPolicy: {
InstancesDistribution: {
OnDemandBaseCapacity: onDemandBase,
OnDemandPercentageAboveBaseCapacity: 20,
SpotAllocationStrategy: config.spotAllocationStrategy
},
LaunchTemplate: {
LaunchTemplateSpecification: { LaunchTemplateId: launchTemplateId },
Overrides: instanceOptions.map(type => ({ InstanceType: type }))
}
},
Tags: [
{ Key: "Environment", Value: "production", PropagateAtLaunch: true },
{ Key: "CostCenter", Value: "compute-optimization", PropagateAtLaunch: true }
]
});
await this.asgClient.send(command);
}
}
// Usage example
const fleetConfig: FleetConfig = {
workloadName: "api-gateway-prod",
baselineHourlySpend: 4.50,
targetCoveragePercent: 85,
allowedFamilies: ["m6g.large", "m6i.large", "c6g.large", "c6i.large"],
architecturePreference: "graviton",
spotAllocationStrategy: "capacity-optimized"
};
const optimizer = new ComputeFleetOptimizer("us-east-1");
optimizer.calculateCommitmentTarget(fleetConfig).then(target => {
console.log(`Recommended hourly commitment: $${target.toFixed(2)}`);
});
Architecture Decisions & Rationale:
- Graviton-First Policy: The resolver filters for
arm64 when architecturePreference is set to graviton. This enforces cost-efficient silicon selection at deployment time.
- Mixed Instance Distribution:
OnDemandBaseCapacity and OnDemandPercentageAboveBaseCapacity ensure a stable baseline while allowing Spot instances to absorb variable load. The capacity-optimized strategy reduces interruption probability compared to lowest-price.
- Commitment Calculation: The coverage target is derived from baseline hourly spend multiplied by the target percentage. This prevents over-commitment by explicitly reserving a buffer for On-Demand scaling.
- Tagging Strategy: Cost allocation tags (
Environment, CostCenter) are propagated at launch, enabling granular billing breakdowns in AWS Cost Explorer without manual reconciliation.
Pitfall Guide
1. Premature Commitment Purchasing
Explanation: Teams purchase Savings Plans or Reserved Instances before running a rightsizing audit. Committing to oversized instances locks in waste for 1β3 years.
Fix: Export 30-day CPU/memory metrics from Compute Optimizer. Downsize instances averaging below 20% utilization before modeling commitment coverage.
2. Processor Suffix Blindness
Explanation: Defaulting to Intel (i or no suffix) instances for Linux workloads that run identically on Graviton (g). This incurs a 10β20% performance-per-dollar penalty.
Fix: Validate ARM64 compatibility using container multi-arch manifests or native compilation. Migrate stateless tiers to g-suffixed instances before purchasing commitments.
3. Over-Commitment to Fixed Terms
Explanation: Purchasing 3-year Reserved Instances for workloads with volatile scaling patterns or planned architecture migrations. Unused commitments are billed in full with no refunds.
Fix: Use Compute Savings Plans with 1-year terms. Maintain a 15β20% On-Demand buffer to absorb architectural shifts and traffic spikes.
4. Misapplying Spot Instances
Explanation: Placing stateful databases, latency-sensitive APIs, or real-time processing workloads on Spot. The 2-minute interruption warning causes service degradation or data corruption.
Fix: Restrict Spot to batch jobs, CI/CD runners, stateless web tiers, and ML training pipelines with checkpointing. Implement interruption handlers that gracefully drain connections and persist state.
5. Stale Recommendation Reliance
Explanation: Trusting AWS Cost Explorer's commitment recommendations, which are based on 72-hour-old data. In dynamic environments, this lag creates coverage gaps and unexpected On-Demand charges.
Fix: Build a custom metrics pipeline that aggregates hourly utilization data with sub-hour granularity. Refresh commitment targets weekly rather than monthly.
6. Ignoring GPU Commitment Nuances
Explanation: Treating accelerated computing instances (p3, p4d, g4dn, trn1, inf2) like general-purpose compute. GPU workloads have different scaling patterns and higher absolute costs.
Fix: Model GPU fleets separately. Instance Savings Plans often yield higher absolute dollar savings for stable ML training pipelines due to deeper discount tiers on specific accelerator families.
7. Neglecting Network & Storage I/O Costs
Explanation: Focusing exclusively on CPU/memory pricing while ignoring data transfer and EBS I/O charges. Storage-optimized (i3, i4i, d3) and memory-optimized (r5, r6i, x2idn) instances carry different I/O baselines that impact total cost of ownership.
Fix: Include network throughput and EBS volume type in rightsizing calculations. Shift high I/O workloads to instances with optimized storage controllers before committing.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Stable web tier with predictable traffic | Compute Savings Plan (1-year) + Graviton instances | Balances discount depth with architectural flexibility; ARM reduces baseline cost | 30β40% reduction vs On-Demand |
| Batch processing / CI/CD pipelines | Spot Instances with capacity-optimized strategy | Maximizes discount (60β90%); interruption tolerance is inherent to batch workloads | 60β80% reduction vs On-Demand |
| ML training with fixed GPU requirements | Instance Savings Plan (1β3 year) on p4d/g4dn | Deeper discounts on accelerator families; stable workload justifies rigid commitment | 40β66% reduction vs On-Demand |
| Dev/Test environments with variable schedules | On-Demand + scheduled start/stop automation | Avoids commitment waste; automation eliminates idle billing | 50β70% reduction vs 24/7 On-Demand |
| Multi-region microservices fleet | Compute Savings Plan + mixed-instance ASGs | Cross-region flexibility; instance family agnostic; absorbs scaling variance | 30β50% reduction vs fragmented RIs |
Configuration Template
# aws-cdk-ec2-fleet.config.ts
import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';
export class ComputeFleetStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const vpc = new ec2.Vpc(this, 'ProdVpc', { maxAzs: 3 });
const launchTemplate = new ec2.CfnLaunchTemplate(this, 'FleetLT', {
launchTemplateData: {
instanceType: 'm6g.large',
imageId: 'ami-0abcdef1234567890', // Replace with ARM64-optimized AMI
securityGroupIds: [new ec2.SecurityGroup(this, 'FleetSG', { vpc }).securityGroupId],
blockDeviceMappings: [{
deviceName: '/dev/xvda',
ebs: { volumeSize: 30, volumeType: 'gp3' }
}]
}
});
new autoscaling.AutoScalingGroup(this, 'MixedInstanceAsg', {
vpc,
launchTemplate: autoscaling.LaunchTemplate.fromCfnLaunchTemplate(launchTemplate),
minCapacity: 2,
maxCapacity: 50,
desiredCapacity: 5,
mixedInstancesPolicy: {
instancesDistribution: {
onDemandBaseCapacity: 1,
onDemandPercentageAboveBaseCapacity: 20,
spotAllocationStrategy: autoscaling.SpotAllocationStrategy.CAPACITY_OPTIMIZED
},
launchTemplateOverrides: [
{ instanceType: 'm6g.large' },
{ instanceType: 'm6i.large' },
{ instanceType: 'c6g.large' }
]
},
blockDuration: cdk.Duration.minutes(60), // Optional: prevent Spot interruption for 1hr
userData: ec2.UserData.forLinux({ shebang: '#!/bin/bash' })
});
}
}
Quick Start Guide
- Export utilization metrics: Navigate to AWS Compute Optimizer, filter by EC2 instances, and download the 30-day CPU/memory report. Identify candidates with average utilization below 20%.
- Validate ARM compatibility: Run
aws ec2 describe-instance-types --filters "Name=processor-info.supported-architecture,Values=arm64" to list Graviton-eligible instances. Update container images or binaries to include linux/arm64 manifests.
- Configure mixed-instance policy: Deploy the provided CDK template or adapt your Terraform/CloudFormation stack to include
m6g, m6i, and c6g overrides. Set spotAllocationStrategy to capacity-optimized and reserve 20% On-Demand above base capacity.
- Purchase Compute Savings Plan: In the AWS Billing Console, calculate your baseline hourly spend, multiply by 0.85, and purchase a 1-year Compute Savings Plan at that hourly rate. Monitor coverage weekly using custom metrics rather than relying on Cost Explorer's 72-hour lag.