use Prometheus as the metric backend and a Python analyzer to identify candidates for rightsizing.
# rightsizing_analyzer.py
import requests
import pandas as pd
from datetime import datetime, timedelta
PROMETHEUS_URL = "https://prometheus.internal/api/v1/query_range"
TOKEN = "your_prometheus_token"
def fetch_metric(query, step="1h", duration="7d"):
headers = {"Authorization": f"Bearer {TOKEN}"}
params = {
"query": query,
"start": (datetime.utcnow() - timedelta(days=7)).isoformat() + "Z",
"end": datetime.utcnow().isoformat() + "Z",
"step": step
}
resp = requests.get(PROMETHEUS_URL, headers=headers, params=params)
resp.raise_for_status()
return resp.json()
def analyze_candidates():
# Average CPU utilization per instance over 7 days
cpu_data = fetch_metric('avg_over_time(node_cpu_seconds_total{mode="idle"}[1h])')
# Memory utilization percentage
mem_data = fetch_metric('1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)')
# Parse and flatten (simplified for demonstration)
cpu_df = pd.DataFrame(cpu_data["data"]["result"])
mem_df = pd.DataFrame(mem_data["data"]["result"])
# Identify instances with avg CPU < 30% AND avg Memory < 40%
candidates = []
for cpu_res in cpu_data["data"]["result"]:
instance = cpu_res["metric"]["instance"]
cpu_vals = [float(v[1]) for v in cpu_res["values"]]
avg_cpu = 100 - (sum(cpu_vals) / len(cpu_vals) * 100)
mem_res = [m for m in mem_data["data"]["result"] if m["metric"]["instance"] == instance]
if mem_res:
mem_vals = [float(v[1]) * 100 for v in mem_res[0]["values"]]
avg_mem = sum(mem_vals) / len(mem_vals)
if avg_cpu < 30 and avg_mem < 40:
candidates.append({
"instance": instance,
"avg_cpu_pct": round(avg_cpu, 2),
"avg_mem_pct": round(avg_mem, 2),
"recommendation": "DOWNSIZE"
})
return candidates
if __name__ == "__main__":
results = analyze_candidates()
for r in results:
print(f"[{r['instance']}] CPU: {r['avg_cpu_pct']}% | MEM: {r['avg_mem_pct']}% β {r['recommendation']}")
2. Adjustment & IaC Integration
Terraform modules should parameterize instance families, sizes, and autoscaling thresholds. Rightsizing becomes a variable-driven change, not a manual edit.
# main.tf
variable "environment" { default = "production" }
variable "workload_profile" { default = "balanced" } # balanced, compute, memory, burst
locals {
size_map = {
balanced = "t3.medium"
compute = "c5.large"
memory = "r5.large"
burst = "t3a.medium"
}
instance_type = local.size_map[var.workload_profile]
}
resource "aws_instance" "app_server" {
ami = data.aws_ami.amazon_linux.id
instance_type = local.instance_type
monitoring = true
root_block_device {
volume_size = 50
volume_type = "gp3"
iops = 3000
}
tags = {
Name = "${var.environment}-app"
CostCenter = "engineering"
Rightsized = "true"
}
}
3. Validation & Continuous Automation
For containerized workloads, Kubernetes Vertical Pod Autoscaler (VPA) and Horizontal Pod Autoscaler (HPA) close the loop by adjusting resource requests/limits based on actual consumption.
# vpa-hpa-combined.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
Integration Workflow
- Collect: Prometheus scrapes node/pod metrics every 15s.
- Analyze: Python script runs nightly, flags candidates, outputs JSON to a FinOps dashboard.
- Approve: Pull request updates Terraform variables or Kubernetes manifests. Policy-as-code (OPA/Conftest) validates against guardrails.
- Apply: CI/CD pipeline deploys changes to staging, runs load tests, then promotes to production.
- Monitor: VPA/HPA continuously tune container workloads; cloud metrics feed back into the analyzer for the next cycle.
Pitfall Guide
Risk: Downsizing instances or reducing resource limits without validating workload characteristics causes latency spikes, timeouts, and SLA breaches.
Mitigation: Always pair rightsizing with load testing. Establish performance baselines (p95/p99 latency, error rates, throughput) before and after changes. Implement rollback triggers if metrics degrade beyond 5%.
2. Ignoring Burst and Seasonal Patterns
Risk: Rightsizing based on average utilization misses peak traffic windows (e.g., marketing campaigns, end-of-month processing, holiday spikes).
Mitigation: Analyze percentiles (p75, p95) and time-series patterns. Use predictive autoscaling or scheduled scaling policies. Retain headroom for documented peak events.
3. Static Rightsizing in Dynamic Environments
Risk: Applying fixed instance sizes or static resource requests to stateless, event-driven, or microservice architectures creates immediate drift as traffic patterns evolve.
Mitigation: Replace static sizing with autoscaling (HPA/VPA, AWS Auto Scaling, GCP Instance Groups). Couple with policy gates that prevent manual overrides without justification.
4. Focusing Only on Compute, Ignoring Storage & Network
Risk: Compute rightsizing yields diminishing returns if provisioned IOPS, throughput, snapshots, and data transfer remain oversized. Storage and egress often account for 30β40% of cloud spend.
Mitigation: Audit EBS/Persistent Disk types, snapshot retention, and S3/GCS lifecycle policies. Rightsize network attachments (ENIs, load balancer capacity, NAT gateway throughput). Align storage tiers with access frequency.
5. Lack of Cross-Team Alignment (FinOps vs Engineering)
Risk: Finance mandates cuts without engineering context, leading to shadow IT, workarounds, or degraded systems. Engineering resists changes due to fear of blame.
Mitigation: Establish a FinOps cadence with shared dashboards, cost allocation tags, and joint review meetings. Frame rightsizing as reliability and agility improvement, not just cost reduction.
6. Automating Without Governance or Approval Workflows
Risk: Fully automated rightsizing pipelines can apply aggressive changes during maintenance windows, trigger cascading failures, or violate compliance requirements.
Mitigation: Implement staged automation: detect β recommend β require approval β apply β validate. Use policy-as-code to enforce minimum thresholds, tag compliance, and change windows. Maintain audit logs for all rightsizing actions.
Production Bundle
β
Rightsizing Checklist
Pre-Flight
Execution
Post-Validation
π Decision Matrix
| Condition | Utilization | Burst Pattern | Business Criticality | Recommended Action |
|---|
| Low steady-state, no peaks | <25% CPU/Mem | None | Low/Medium | Downsize 1β2 instance tiers or reduce resource requests |
| Moderate with predictable peaks | 30β50% | Scheduled/seasonal | High | Keep current size; implement scheduled scaling |
| High with unpredictable spikes | >75% | Unpredictable | Critical | Upsize or enable predictive autoscaling; add caching/queueing |
| Idle or orphaned | <5% for >14 days | None | Any | Decommission or archive; snapshot volumes if needed |
| Storage-heavy, low IOPS usage | <30% provisioned IOPS | None | Any | Switch to gp3/io2, reduce IOPS, apply lifecycle policies |
| Network bottlenecked | <40% CPU, >80% net utilization | High | High | Rightsize NIC/ENI, upgrade instance family, optimize egress |
# variables.tf
variable "app_name" { type = string }
variable "environment" { type = string }
variable "cpu_target_pct" { default = 65 }
variable "mem_target_pct" { default = 70 }
variable "min_replicas" { default = 2 }
variable "max_replicas" { default = 8 }
# deployment.tf
resource "kubernetes_deployment" "app" {
metadata {
name = var.app_name
namespace = var.environment
}
spec {
replicas = var.min_replicas
template {
spec {
container {
name = var.app_name
image = "registry.internal/${var.app_name}:latest"
resources {
requests = { cpu = "250m", memory = "256Mi" }
limits = { cpu = "500m", memory = "512Mi" }
}
}
}
}
}
}
resource "kubernetes_horizontal_pod_autoscaler_v2" "app_hpa" {
metadata { name = "${var.app_name}-hpa" }
spec {
scale_target_ref {
api_version = "apps/v1"
kind = "Deployment"
name = var.app_name
}
min_replicas = var.min_replicas
max_replicas = var.max_replicas
metric {
type = "Resource"
resource {
name = "cpu"
target { type = "Utilization"; average_utilization = var.cpu_target_pct }
}
}
metric {
type = "Resource"
resource {
name = "memory"
target { type = "Utilization"; average_utilization = var.mem_target_pct }
}
}
}
}
π Quick Start (2-Week Implementation)
Days 1β2: Discovery & Tagging
- Deploy monitoring agents (Prometheus node exporter, cloud agent)
- Enforce tagging policy via CI/CD and OPA/Conftest
- Export current inventory to CSV/JSON with cost tags
Days 3β5: Baseline & Analysis
- Run
rightsizing_analyzer.py or equivalent cloud-native tool
- Filter out test/dev, batch jobs, and data pipelines
- Generate candidate list with p95/p99 metrics
Days 6β8: Pilot & Validation
- Select 3β5 non-critical workloads
- Update Terraform variables or Kubernetes manifests
- Run synthetic load tests; validate latency/error thresholds
- Deploy to staging; monitor for 48 hours
Days 9β11: Production Rollout
- Schedule change window with stakeholder sign-off
- Apply IaC changes; enable autoscaling policies
- Monitor dashboards; prepare rollback scripts
Days 12β14: Automation & Handoff
- Integrate analyzer into CI/CD pipeline (nightly cron)
- Create FinOps dashboard with cost savings tracking
- Document runbook, train platform team, set quarterly review cadence
Infrastructure rightsizing is a continuous discipline, not a project. By embedding observation, policy guardrails, and automated validation into your delivery lifecycle, you transform cost optimization from a reactive expense into a strategic enabler of performance, agility, and sustainable scale.