Note: This is the first of a five-part series covering Kubernetes resource management and optimization. We start by describing Kubernetes resource types.

Before we dive into Kubernetes resources, let’s clarify what the term “resource” refers to here. Anything we create in a Kubernetes cluster is considered a resource: deployments, pods, services and more. For this tutorial, we’ll focus on primary resources like CPU and memory, along with other resource types like ephemeral storage and extended resources.

One aspect of cluster management is to assign these resources automatically to containers running in pods so that, ideally, each container has the resources it needs, but no more.

In this article, we’ll highlight logical resources for containers running on a cluster. We’ll break down four common Kubernetes resources developers work with on a daily basis: CPU, memory, ephemeral storage and extended resources. For each resource, we’ll explore how it’s measured within Kubernetes, review how to monitor each particular resource and highlight some best practices for optimizing resource use.

Let’s explore each primary Kubernetes resource type in depth. Then let’s see these resource types in action with some code samples.

Get Started with StormForge

Try StormForge for FREE, and start optimizing your Kubernetes environment now.

Start Trial


A Kubernetes cluster typically runs on multiple machines, each with multiple CPU cores. They sum up to a total number of available cores, like four machines times four cores for a total of 16.

We don’t need to work with whole numbers of cores. We can specify any fraction of a CPU core in 1/1,000th increments (for example, half a core or 500 mill-CPU).

Kubernetes containers run on the Linux kernel, which allows specifying cgroups to limit resources. The Linux scheduler compares the CPU time used (defined by internal time slices) with the defined limit to decide whether to run a container in the next time slice. We can query CPU resources with the kubectl top command, invoking it for a pod or node.

We can optimize our use of processor time by making the program running in a container more efficient, either through improved algorithms and coding or by compiler optimization. The cluster user doesn’t have much influence on the speed or efficiency of precompiled containers.


The machines in a Kubernetes cluster also each have memory, which again sums up to a cluster total. For example, four machines times 32 GiB is 128 GiB.

The kernel level controls main memory, similar to CPU time with cgroups. If a routine in a container requests memory allocation beyond a hard limit, it signals an out-of-memory error.

Optimizing resource use is largely up to the application’s development effort. One step is to improve garbage collection frequency to keep a heap-based image from allocating memory beyond a hard limit. Again, the kubectl top command can provide information about memory use.

Exploring CPU and Memory

As our first in-depth example, let’s deploy three replicated containers of the popular web server NGINX to a local Kubernetes installation. We’re running a one-node “cluster” on our laptop, which only has two cores and 2 GiB of memory.

The code below defines such a pod deployment and grants each of three NGINX containers one-tenth of a core (100 milli-CPU) and 100 MiB of main memory. The code below also limits their use to double the requested values.

apiVersion: apps/v1
kind: Deployment
  name: nginx-deployment
    app: nginx
  replicas: 3
    app: nginx
        app: nginx
    - name: nginx
    image: nginx
        cpu: "100m"
        memory: "100Mi"
        cpu: "200m"
        memory: "200Mi"
    - containerPort: 80

We can deploy into the default namespace like this:

kubectl apply -f nginx.yaml

The local cluster only has a single node. Use this command to return detailed information about it:

kubectl describe nodes docker-desktop

After clipping most of the output, we can examine some information about resource use:

Namespace Name                           CPU.  Requests  CPU Limits Memory Requests Memory  Limits  Age
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx-deployment-585bd9cc5f-djql8 100m  (5%)     200m (10%) 100Mi  (5%)     200Mi   (10%)   66s
default nginx-deployment-585bd9cc5f-gz98r 100m  (5%)     200m (10%) 100Mi  (5%)     200Mi   (10%)   66s
default nginx-deployment-585bd9cc5f-vmdnc 100m  (5%)     200m (10%) 100Mi  (5%)     200Mi   (10%)   66s
Resource            Requests    Limits
-------- -------- ------
cpu 1150m           (57%) 600m  (30%)
memory 540Mi        (28%) 940Mi (49%)
ephemeral-storage 0 (0%)    0   (0%)
hugepages-1Gi 0     (0%)    0   (0%)
hugepages-2Mi 0     (0%)    0   (0%)

This information shows the CPU and memory use requests and limits, just as our deployment object specified. It also displays the values as a percentage of the maximum possible allotment.

Next are the current totals for this node, again listed as absolute values and percentages. These numbers include some other containers running in the kube-system namespace that we haven’t shown here, so there will be a discrepancy not covered by the output above.

The above snippet’s last three lines indicate other types of resources beyond CPU and memory, which don’t have set requests or limits in this example.

Ephemeral Storage

One additional Kubernetes resource type is ephemeral storage. This is mounted storage that doesn’t survive the pod’s life cycle. Kubernetes often uses ephemeral storage for caching or logs but never uses it for important data, like user records. We can request or limit ephemeral storage like main memory, but it’s often not as limited a resource.

So what do hugepages-1Gi and hugepages-2Mi mean in the code snippet above? Huge pages are a modern memory feature of the Linux kernel to allocate large main memory pages of configurable size to processes. We can do this for efficiency.

Kubernetes supports assigning such large pages to containers. These form a resource type per page size that we can request separately.

When specifying a request or limit, we set the total amount of memory, not the number of pages.

hugepages-2Mi: "100Mi"
hugepages-1Gi: "2Gi"Here, we limit the number of 2 MiB pages to 50 and the number of 1 GiB pages to 2.

Extended Resources

Cluster users can also define their own resource types — per cluster or node — using the extended resource type. Once we’ve defined a type and specified available units, we can use requests and limits, just as with the built-in resources we’ve used so far.

An example is:

cpu: "200m" 100

This setting limits the container to 20 percent of a core and 100 of our project’s handles.

Resource Requests and Limits

Notice that resource requests and limits were key to our conversation about ephemeral storage and extended resources. This is because an end user can specify resource requests and limits in an application’s deployment manifest, which imposes some rules about how Kubernetes should treat a container or pod.

Requests indicate how much of a resource a container should have. They help the scheduler assign pods to nodes based on the amount of resources requested and available resources on those nodes.

Limits are used to indicate a hard upper boundary on how much of a resource a container can use, enforced at the operating-system level. Requests and limits are optional, but if we don’t specify a limit, a container can use most of the node’s resources, which can have negative cost or performance implications. So, we must be cautious.

Bear in mind that although a pod can contain more than one container, usually there is only one container per pod. We allocate resources to containers, but all of a pod’s containers draw from a common pool of resources at the node level.

In part two of this tutorial series, we’ll dive deeper into the world of Kubernetes requests and limits.

Considering Quality of Service

The resources system we’ve described so far is a fairly simple way of managing compute resources. Kubernetes offers a simple quality of service (QoS) system on top of this.

QoS describes a technical system’s means of offering different service levels while maintaining the best overall quality, given the hardware’s limitations. The Kubernetes QoS system assigns one of three levels to a pod: Guaranteed, Burstable and BestEffort. Refer to the Kubernetes documentation to learn how to assign these levels and how they affect pod scheduling.

The Guaranteed level offers exactly the requested and limited resources during the pod’s lifetime and suits applications like monitoring systems that run at a constant load.

The Burstable service level is suitable for pods with a basic use profile that can sometimes increase above the baseline due to increased demand. This level is ideal for databases or web servers, whose load depends on the number of incoming requests.

Finally, BestEffort makes no resource availability guarantee. So, it’s best suited for applications like batch jobs that can repeat if needed or for staging environments that aren’t mission critical.


Kubernetes clusters maintain hardware resources like CPU time, memory, ephemeral storage, and extended resources and assign them to running containers. Through a system of requests and limits, operators can tailor resource allocation to individual containers and then let the Kubernetes system assign them to nodes appropriately.

Extended resources enable us to define our own resource types and use them similarly. Kubernetes also assigns quality of service designations to pods according to requests and limits. It then uses these designations to make scheduling and termination decisions.

Kubernetes resource optimization is essential to balance costs with the end-user experience. Yet, assigning parameters by hand using this article’s methods can be time consuming, costly and difficult to scale.

We’d rather spend our time creating exciting new features that drive competitive advantage or improved user experience than worrying about optimization and resource use. The StormForge platform helps manage and optimize your Kubernetes resources automatically using machine learning to find the best configuration based on our cost and performance goals.

To learn more about how StormForge works, check out this demo.

Stay tuned for future articles where we’ll explain resource requests and limits, how to set optimization goals and how to tackle some of the most difficult optimization challenges.