A concise but comprehensive guide to using and managing horizontal and vertical autoscaling in the Kubernetes environment.

Virtually everyone running Kubernetes infrastructure tries to use only the resources they need — nothing more and nothing less. Therefore, the resources in use must be able to scale in either direction to meet the required demands.

Fortunately, Kubernetes enables us to automate our scaling processes out of the box, preventing us from having to manually scale and provision every time the need arises. It saves time and human effort, which lowers costs as well. However, the numerous controls configuring these autoscaling functionalities can be overwhelming, even for seasoned experts.

This cheat sheet aims to demystify Kubernetes autoscaling on a resource-by-resource basis. Let’s explore the types of resources that we can autoscale and work through several practice exercises.

Types of Kubernetes Resources That Accommodate Autoscaling

In the Kubernetes ecosystem, resources are things we create. All of the following resources can be autoscaled:

  • Pod — The smallest unit of Kubernetes workload deployed
  • Node — A group of pods
  • ReplicaSet — A process that runs multiple instances of a pod to maintain stable numbers of them
  • Deployment — Manages ReplicaSets and declaratively manages pod instances. All replica instances share a single volume and PersisentVolumeClaim (PVC).
  • StatefulSet — Used in stateful applications (when deployments are used for stateless applications). A StatefulSet ensures that each pod has its own volume and PVC.
  • RAM and CPU cores — Memory and computation resources for a Kubernetes cluster

Notably, some resources likeDaemonSetscannot be scaled. Read through this StormForgearticlefor in-depth insight into Kubernetes resource types.

Requests and Limits

Kubernetes allows us to specify controls and checks on resourcerequests and limits.

  • Requests — Specifies resources that must be reserved for a pod container.
  • Limits — Specifies the maximum allowable number of resources that a container can use.

In an ideal Kubernetes environment, we must specify the CPU, memory and memory requests. Otherwise, Kubernetes will automatically set generous default requests, which can inflate costs.

Pod Quality of Service

Kubernetes also enables us to assign a class to a pod’s quality of service (QoS) in our preferred order of priority:

  • Guaranteed QoS class— Highest priority pods. The top-priority pods or all the containers in a pod can be assigned the Guaranteed QoS.
status: qos
Class: Guaranteed
  • Best-Effort QoS class— Lowest priority pods. The pods with little priority can use the Best-Effort QoS. The containers in Best-Effort pods do not have CPU requests or memory limits.
status: qos
Class: BestEffort
  • Burstable QoS class— Midrange priority pods, falling in between high and low priority classes, can use Burstable. One container in the pod must have a CPU or memory limit or request.
status: qos
Class: Burstable
Guaranteed Best Effort Burstable
Highest priority pods Lowest priority pods Mid-priority pods
All containers must have CPU or memory limits and requests. Containers have no CPU or memory limits and requests. At least one container must have a CPU or memory limit or request.

Scaling Objects and Definitions

  • Custom Resource Definition (CRD) — Enables introducing unique objects to clusters
  • HorizontalPodAutoscaler (HPA) — A CRD object for increasing the number of a pod’s replicas based on metrics such as CPU utilization
  • VerticalPodAutoscaler (VPA) — A CRD object for setting the requests and limits for containers in pods
  • Horizontal scaling — Increases the number of pods to handle the increase in workload. The HPA manages this.
  • Vertical scaling — Adds more memory or CPU capacity to currently available pods rather than adding more pods

Prerequisites for Resource Autoscaling

  1. Create your Kubernetes cluster.
  2. Optional but recommended: Configure role-based access control (RBAC).
  3. Install the kubectl command-line tool in your workstation and connect to the cluster.
  4. Ensure that you have a deployment running for your cluster. This guide uses a deployment named python-hpa.
  5. Optional: To scale based on custom metrics, deploy your metrics server to your cluster.

Horizontal Scaling

1. Create an HPA.

Use the kubectl autoscale subcommand.

kubectl autoscale deployment python-hpa --cpu-percent=50 --min=1 --max=10

The kubectl autoscale command creates the HPA for a deployment named python-hpa.

  • cpu-percent=50 sets the threshold for CPU usage across all pods to 50%. The HPA will increase or decrease the number of pods to meet the threshold.
  • min=1 and –max=10 mean that the number of replicas will remain between 1 and 10. The replicas will be controlled by the python-hpa deployment in this demo. Substitute any other deployment you had to create before running the kubectl autoscale subcommand.

2. Stabilization Window

It usually takes a few minutes to stabilize the number of replicas added or removed. Autoscaling all of it together isn’t instant. It takes a few minutes.

3. Autoscaling on Specific Metrics

You can state specific monitored metrics in the YAML of the HPA created. This enables scaling based on metrics defined by your monitoring or observability platform.

Use the autoscaling/v2 version to get the YAML of your HPA with the following command:

kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml

You can then add a custom metric in this format:

type: Object
   name: metric_name  

An example for the Prometheus adapter might look like this:

  - type: Pods
    metricName: node_network_receive_bytes
    targetAverageValue: 100000m

4. Status Conditions

You should check if your autoscaler can scale or if there’s a restriction that might prevent it from scaling. You can find this information with the kubectl describe hpa subcommand:

kubectl describe hpa python-hpa

The conditions are in the status.conditions field of the command-line interface (CLI) output.

Vertical Scaling

1. VPA Components

  • Admission Plugin — Sets the correct resource requests for new pods
  • Recommender — Compares past and present rates of resource consumption to predict and recommend values for memory and CPU requests
  • Updater — Checks which pods have correct resources, and then kills those with incorrect resources so that updated requests can recreate them

2. VPA Operation Modes

The VPA operates in the following four modes:

  • Recreate — The VPA assigns resource requests when creating pods and updates them by evicting and recreating them. Updates occur when the resources are substantially different from the recommendations.
  • Auto — Recreates pods based on recommendation
  • Off — The VPA only gives recommendations without automatically updating requests.
  • Initial — The VPA creates requests while creating pods but never updates them later.

3. Limits Control

Resource policies determine the limits set by VPA. Basically, it maintains its recommendations between the minimum (min) and maximum (max) values of limit ranges in the spec.limits section of the YAML definition.

4. Commands for Setting Up VPA and Their Parameters

For this section, you need the following:

  • Git installed and configured on your workstation
  • Preferred metrics server deployed to your cluster
  • Tear down previous versions of VPA (if they exist) with the following command:
Install Vertical Pod Autoscaler (VPA):

Download the VPA source code:

git clone https://github.com/kubernetes/autoscaler.git
Set up the VPA:
Tear down the VPA if you want to stop running it:

Note that VPA should not be used with the HPA unless scaling based on custom metrics. See the official guide for more information on how to use the VPA.

5. Kubernetes-Based Event-Driven Autoscaler (KEDA)

KEDA is a lightweight tool that helps you manage your autoscaling by the number of events to be processed. You can add it to your cluster to extend functionality alongside the standard HPA and other components.

  • KEDA controls Kubernetes deployments to scale up and down to zero events.
  • KEDA also serves as a metrics server for Kubernetes.

Installing KEDA gives you these four custom resources:

  • ScaledObject — Describes the desired mapping between an event source (such as RabbitMQ) and your deployment
  • ScaledJob — Describes the mapping between an event source and your Kubernetes job
  • TriggerAuthentications — Contains the authentication config to monitor the event source for a ScaledObject
  • ClusterTriggerAuthentications — Contains the authentication configurations to monitor the event source for a ScaledJob

6. KEDA Events

KEDA emits events, such as ScaledObjectReady the first time a ScaledObject is ready and ScaledJobDeleted when a ScaledJob is deleted. Check the KEDA Events webpage to see an exhaustive list of events emitted by KEDA.


This cheat sheet provides a concise but comprehensive guide to using and managing horizontal and vertical autoscaling in the Kubernetes environment. With this information, you can confidently implement autoscaling in your Kubernetes workload.

StormForge helps you save money, time and effort by using machine learning to automate the management and optimization of your Kubernetes resources, including acting as a VPA and adjusting your HPA settings to increase efficiency. Using a solution like StormForge can provide improved efficiency compared to using the HPA or VPA standalone, and it allows for vertical and horizontal autoscaling to work together without thrashing.