|
|
Kubernetes Horizontal Pod Autoscaler (HPA)
Author: Venkata Sudhakar
The Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pod replicas in a Deployment or StatefulSet based on observed metrics. When CPU utilisation rises above a threshold (for example, 70%), the HPA adds more Pods to distribute the load. When utilisation falls, it removes Pods to save resources. This gives you elastic scalability without manual intervention - your application automatically handles traffic spikes and scales back down during quiet periods, optimising both performance and cost. HPA requires the Kubernetes Metrics Server to be running in the cluster, which collects resource usage statistics from the kubelet on each node. For CPU and memory based scaling, no additional setup is required beyond the Metrics Server. For custom metrics (queue depth, request latency, messages per second), you need an external metrics adapter such as KEDA (Kubernetes Event-Driven Autoscaling) or the Prometheus Adapter. KEDA is particularly popular for scaling based on Kafka consumer lag, which is essential for data pipeline workloads. The below example shows how to configure HPA for CPU-based scaling on a Spring Boot Deployment, then a KEDA ScaledObject for scaling based on Kafka consumer lag.
It gives the following output,
# Monitor HPA status
kubectl get hpa myapp-hpa -n production -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
myapp-hpa Deployment/myapp 25%/70%, 40%/80% 2 10 2
myapp-hpa Deployment/myapp 73%/70%, 45%/80% 2 10 2 <- spike!
myapp-hpa Deployment/myapp 73%/70%, 45%/80% 2 10 4 <- scaled up
myapp-hpa Deployment/myapp 48%/70%, 38%/80% 2 10 4 <- stable
myapp-hpa Deployment/myapp 22%/70%, 30%/80% 2 10 4 <- cooling down
myapp-hpa Deployment/myapp 22%/70%, 30%/80% 2 10 2 <- scaled down
# scaleDown.stabilizationWindowSeconds=300 means HPA waits 5 minutes
# before scaling down to prevent thrashing on brief traffic drops
It gives the following output,
# KEDA scales replicas based on total consumer lag / lagThreshold
# lag=0 -> 1 replica (minReplicaCount)
# lag=100 -> 1 replica
# lag=500 -> 5 replicas
# lag=2000 -> 20 replicas (maxReplicaCount)
kubectl get scaledobject kafka-order-processor-scaler -n production
NAME SCALETARGETKIND MIN MAX TRIGGERS READY
kafka-order-processor-scaler Deployment 1 20 kafka True
# When Kafka lag spikes (e.g. a burst of 1000 new orders):
kubectl get pods -n production -l app=order-processor
NAME READY STATUS
order-processor-abc-1 1/1 Running
order-processor-abc-2 1/1 Running
order-processor-abc-3 1/1 Running <- KEDA added replicas
order-processor-abc-4 1/1 Running
order-processor-abc-5 1/1 Running
# As lag decreases, KEDA scales back down automatically
HPA tuning tips: Always set resource requests - HPA cannot calculate CPU utilisation without resource.requests.cpu being set. If requests are not defined, the HPA target will show Unknown. stabilizationWindowSeconds - Set a longer scale-down window (300s) than scale-up window (60s) to avoid thrashing. Traffic spikes are sudden; idle periods should be confirmed before removing Pods. Start with CPU HPA, graduate to KEDA - CPU-based HPA is simple and works well for web APIs. For data pipelines and event-driven workloads where CPU is not the bottleneck, KEDA with Kafka lag or queue depth metrics gives much more accurate scaling signals.
|
|