collects resource usage from Kubelets. Without this, neither HPA nor VPA can function.
2. HPA Controller: Watches HorizontalPodAutoscaler objects, queries metrics, and calculates desired replica counts. It interacts with the ReplicaSet controller.
3. VPA Components:
* Recommender: Analyzes usage history and calculates resource recommendations.
* Updater: Identifies pods that should be updated based on recommendations and evicts them.
* Admission Controller: Intercepts pod creation to apply recommended resources (used in Initial mode).
Step-by-Step Implementation
1. Deploy Metrics Server
Ensure the Metrics Server is running with --kubelet-insecure-tls (for local dev) or proper certificate configuration for production.
# metrics-server-deployment.yaml snippet
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
spec:
template:
spec:
containers:
- name: metrics-server
image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
2. Horizontal Pod Autoscaler Configuration
HPA v2 supports multiple metrics and behavior policies to control scaling velocity.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
Rationale: The behavior field is critical. scaleDown stabilization prevents premature pod removal during transient lulls. Policies limit the rate of change, protecting downstream dependencies.
3. Vertical Pod Autoscaler Configuration
VPA must be configured carefully based on the presence of HPA.
Scenario A: VPA Only (No HPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
Scenario B: HPA and VPA Combined
When HPA manages replica count, VPA should only set resources at pod creation to avoid evicting pods that HPA relies on.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa-combined
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-deployment
updatePolicy:
updateMode: "Initial" # Critical: VPA only sets resources on new pods
resourcePolicy:
containerPolicies:
- containerName: "app"
mode: "Auto"
Integration with Cluster Autoscaler
Autoscaling pods is futile if the cluster cannot provision nodes. Ensure the Cluster Autoscaler is configured with appropriate --scale-down-utilization-threshold and --scale-down-delay-after-add to complement HPA/VPA actions.
Pitfall Guide
1. The HPA/VPA Eviction Loop
Mistake: Running HPA and VPA with VPA updateMode: "Auto".
Impact: VPA detects a pod needs more memory, evicts it. HPA sees a replica drop and creates a new one. VPA immediately evicts the new one. This creates a churn loop, exhausting API resources and causing downtime.
Fix: Set VPA to updateMode: "Initial" or "Off" when HPA is present.
2. VPA "Auto" Mode OOM Kills
Mistake: Switching VPA to Auto mode on a production workload immediately.
Impact: VPA recommends higher limits based on usage, but the application may have memory leaks or cache buildup that VPA misinterprets as required memory. The increased limit allows the leak to grow until the node OOMs.
Fix: Run VPA in Recommend mode for at least one full business cycle to analyze recommendations. Apply minAllowed and maxAllowed constraints.
3. Missing Metrics Server or TLS Errors
Mistake: Deploying HPA/VPA without verifying Metrics Server health.
Impact: HPA status shows Invalid or Unknown. VPA fails to generate recommendations.
Fix: Check kubectl get --raw "/apis/metrics.k8s.io/v1beta1" to verify API availability. Ensure Kubelet certificates are trusted.
4. HPA Scale-Down Flapping
Mistake: Not configuring stabilizationWindowSeconds for scale-down.
Impact: HPA scales down pods, then traffic returns slightly, causing immediate scale-up. This wastes resources and increases latency.
Fix: Set stabilizationWindowSeconds to a value longer than your typical traffic micro-bursts (e.g., 300s).
5. Requests vs. Limits Mismatch
Mistake: Setting CPU requests high but limits low, or vice versa, without understanding QoS classes.
Impact: Pods with Burstable QoS are evicted first during node pressure. VPA may recommend requests that push pods into Guaranteed QoS, changing eviction priority unexpectedly.
Fix: Align requests with VPA recommendations. Use maxAllowed to prevent VPA from creating Guaranteed pods if you prefer Burstable for cost reasons.
6. Custom Metrics Cardinality
Mistake: Using high-cardinality labels in Custom Metrics for HPA.
Impact: Metrics adapter overload; HPA cannot aggregate metrics efficiently; slow reconciliation.
Fix: Aggregate metrics at the exporter level or use low-cardinality labels (e.g., namespace, deployment) rather than pod_id or user_id.
7. Ignoring Cluster Capacity
Mistake: Setting maxReplicas on HPA without calculating node capacity.
Impact: HPA scales to maxReplicas, but the cluster cannot schedule pods. Pods remain Pending. Cluster Autoscaler may not trigger if the pending pods don't fit in a single node type.
Fix: Calculate maxReplicas based on (NodeCapacity / PodRequests) * NodeCount. Ensure Cluster Autoscaler supports the required node sizes.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Steady traffic, varying memory usage | VPA (Auto) | Optimizes memory requests, reduces over-provisioning waste. | High reduction in waste. |
| Bursty traffic, predictable size | HPA | Scales replicas to handle throughput spikes. | Increases cost during spikes, saves during lulls. |
| Bursty traffic + varying size | HPA + VPA (Initial) | HPA handles spikes; VPA ensures pods are right-sized. | Optimal balance of cost and performance. |
| Event-driven batch jobs | KEDA or CronHPA | HPA/VPA react to metrics; KEDA reacts to event sources (queues). | High efficiency; scales to zero. |
| Strict latency requirements | Static + HPA (Aggressive) | VPA evictions cause restart latency. HPA adds cold-start latency. | Higher cost for reserved capacity. |
Configuration Template
Copy-paste template for a production workload using HPA and VPA safely.
# hpa-vpa-production.yaml
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: production-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: production-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
behavior:
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 4
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Percent
value: 20
periodSeconds: 120
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: production-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: production-app
updatePolicy:
updateMode: "Initial" # Safe mode when HPA is present
resourcePolicy:
containerPolicies:
- containerName: "app-container"
mode: "Auto"
minAllowed:
cpu: 250m
memory: 256Mi
maxAllowed:
cpu: 2
memory: 4Gi
Quick Start Guide
-
Install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Patch for local clusters: kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
-
Apply VPA in Recommend Mode:
Create vpa-recommend.yaml with updateMode: "Off" and apply:
kubectl apply -f vpa-recommend.yaml
Wait 24 hours to gather data.
-
Review and Apply HPA:
Check kubectl describe hpa (if exists) or kubectl top pods to determine thresholds. Create hpa.yaml with appropriate metrics and apply:
kubectl apply -f hpa.yaml
-
Activate VPA:
Update VPA to updateMode: "Initial" (if HPA is active) or "Auto" (if HPA is not active). Apply changes:
kubectl apply -f vpa-active.yaml
-
Verify:
Generate load and monitor:
kubectl get hpa -w
kubectl get vpa -w
kubectl get events --field-selector reason=FailedScale,FailedUpdate