tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Container Management > Kubernetes > Kubernetes Horizontal Pod Autoscaler (HPA)

Kubernetes Horizontal Pod Autoscaler (HPA)

Author: Venkata Sudhakar

The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pod replicas in a Deployment or StatefulSet based on observed metrics. When CPU utilisation rises above a threshold (for example, 70%), the HPA adds more Pods to distribute the load. When utilisation falls, it removes Pods to save resources. This gives you elastic scalability without manual intervention - your application automatically handles traffic spikes and scales back down during quiet periods, optimising both performance and cost.

HPA requires the Kubernetes Metrics Server to be running in the cluster, which collects resource usage statistics from the kubelet on each node. For CPU and memory based scaling, no additional setup is required beyond the Metrics Server. For custom metrics (queue depth, request latency, messages per second), you need an external metrics adapter such as KEDA (Kubernetes Event-Driven Autoscaling) or the Prometheus Adapter. KEDA is particularly popular for scaling based on Kafka consumer lag, which is essential for data pipeline workloads.

The below example shows how to configure HPA for CPU-based scaling on a Spring Boot Deployment, then a KEDA ScaledObject for scaling based on Kafka consumer lag.


It gives the following output,

# Monitor HPA status
kubectl get hpa myapp-hpa -n production -w

NAME       REFERENCE          TARGETS          MINPODS  MAXPODS  REPLICAS
myapp-hpa  Deployment/myapp  25%/70%, 40%/80%  2        10       2
myapp-hpa  Deployment/myapp  73%/70%, 45%/80%  2        10       2      <- spike!
myapp-hpa  Deployment/myapp  73%/70%, 45%/80%  2        10       4      <- scaled up
myapp-hpa  Deployment/myapp  48%/70%, 38%/80%  2        10       4      <- stable
myapp-hpa  Deployment/myapp  22%/70%, 30%/80%  2        10       4      <- cooling down
myapp-hpa  Deployment/myapp  22%/70%, 30%/80%  2        10       2      <- scaled down

# scaleDown.stabilizationWindowSeconds=300 means HPA waits 5 minutes
# before scaling down to prevent thrashing on brief traffic drops

It gives the following output,

# KEDA scales replicas based on total consumer lag / lagThreshold
# lag=0 -> 1 replica (minReplicaCount)
# lag=100 -> 1 replica
# lag=500 -> 5 replicas
# lag=2000 -> 20 replicas (maxReplicaCount)

kubectl get scaledobject kafka-order-processor-scaler -n production
NAME                            SCALETARGETKIND  MIN  MAX  TRIGGERS  READY
kafka-order-processor-scaler   Deployment       1    20   kafka     True

# When Kafka lag spikes (e.g. a burst of 1000 new orders):
kubectl get pods -n production -l app=order-processor
NAME                              READY  STATUS
order-processor-abc-1             1/1    Running
order-processor-abc-2             1/1    Running
order-processor-abc-3             1/1    Running   <- KEDA added replicas
order-processor-abc-4             1/1    Running
order-processor-abc-5             1/1    Running

# As lag decreases, KEDA scales back down automatically

HPA tuning tips:

Always set resource requests - HPA cannot calculate CPU utilisation without resource.requests.cpu being set. If requests are not defined, the HPA target will show Unknown. stabilizationWindowSeconds - Set a longer scale-down window (300s) than scale-up window (60s) to avoid thrashing. Traffic spikes are sudden; idle periods should be confirmed before removing Pods. Start with CPU HPA, graduate to KEDA - CPU-based HPA is simple and works well for web APIs. For data pipelines and event-driven workloads where CPU is not the bottleneck, KEDA with Kafka lag or queue depth metrics gives much more accurate scaling signals.


 
  


  
bl  br