Introduction
Kubernetes is one of the most powerful and flexible software technologies to come along in years. With this tool your business is capable of deploying containerized applications and services at scale and even automating those deployments.
But along with this flexibility comes a complexity you may not have yet experienced. The complexity doesn’t lie in the creation of the Kubernetes cluster, as that can be taken care of in seconds or minutes (depending on whether you are using a hosted or an on-premises solution). And even deploying your first application isn’t all that challenging.
Where the difficulty truly lies is in the management of resources. It is in this particular arena that you will come up against a number of common questions and pitfalls that can result in poor performance, costly over-provisioning, or unstable configurations that result in downtime.
To make this even more frustrating, you might be following best practices, and still find yourself up against a wall, with containers not behaving as expected or services costing you far more than your budget allows.
What do you do?
You look to improve your Kubernetes resource management.
Easier said than done. And anyone who has deployed Kubernetes applications at scale understands exactly what we’re talking about.
So let’s address the subject of resource management in Kubernetes.
What Are Resources in Kubernetes?
First we must understand what a Kubernetes resource is. To help make this clear, let’s compare a Kubernetes application to one that is installed and used via the traditional method.
Before cloud-native architectures became commonplace, monolithic applications were deployed as standalone entities that would grab as many resources available to them on the server. If an app needed 8GB of RAM and the server had it available, it could consume that RAM, leaving less behind for other applications, or even cause the system to crash. Install enough of those types of applications on a server and you could quickly wind up with a resource shortage.
This led to a very cut-and-dried architecture pattern of purposedriven, or application-driven, servers. But as platforms and applications grew, the scale became unmanageable and extremely inefficient at the enterprise level.
That’s where distributed systems come into play. These systems (operated by platforms such as Kubernetes) require careful orchestration, fencing, and planning for shared resource allocation. When done correctly, they perform and scale in ways the monoliths cannot.
In order to successfully deploy a scalable, distributed system— especially one that will be cloud-native, you must pay careful attention to resource management.
Types of Resources

Two of the most important resources in Kubernetes are CPU and memory. These should be fairly straightforward: A CPU resource is the number of cores you will allow a container to consume and memory is how much RAM you’ll hand over to that container or pod.
In simple terms, you can think of it as: How much memory does your app or database need in order to keep it from crashing? Or, how many CPUs do you need to give workers so they can be as performant as possible?
Of course, there are other application-specific resources that must be considered as well. For example, if you’re deploying a Java application for a distributed environment, you will have to tinker with garbage collection settings and JVM heaps, which are incredibly critical when splitting up resources for a Java deployment.
In fact, there are a vast number of resources developers have to configure in order to ensure applications run in a stable and efficient fashion. For key resources like CPU and memory, there are two very critical options to consider with Kubernetes.
What Aspects of Resources Can You Control in Kubernetes?
The two options that you must consider for each Kubernetes resource are requests and limits. These options are defined for each resource within the application manifest. Let’s consider each of them.
Requests
In Kubernetes, a Pod requests the resources needed to run its containers. A Kubernetes resource request is where you define the minimum amount of a particular resource, like CPU or memory, that the application will need to function properly.
Let’s say you have a container and you know the minimum amount of RAM that container needs to function as expected is 1GB and the minimum CPU it requires is a half a core. So, in your manifest you set the memory request to 1GB and CPU to 0.5 and deploy. The Kubernetes scheduler will then search for a node that has enough of those resources. If Kubernetes cannot find a node on the cluster to meet those resource requests, you’re out of luck and the pod will not deploy.
Limits
Limits are similar to requests, but they live on the opposite end of the spectrum. Where requests are the minimum amount of a resource the application requires to run properly, limits are the maximum amount of a resource that will be used.
For example, you could define a request for a database container to be 1GB, so the cluster will never allocate less than 1GB to the container. You could then set a 2GB limit on the container, such that the cluster will never allocate more than 2GB of RAM to the database container. Similarly, you could set the CPU request to 0.5 cores and the limit to 1.
Finding the Sweet Spot
The trick here is, how do you find the sweet spot? Do you know, absolutely, that the database container will function at optimal performance somewhere between 1GB and 2GB of RAM and (maybe) 0.5 and 1 CPU. How do you tell? Do you write the manifest, deploy, and then measure the performance? If the database isn’t functioning well, do you then change the requests and limits and deploy again?
And what about scaling your deployment? Is it better to scale the pod horizontally or vertically? If we scale vertically do we have the extra capacity to support the larger requests? Will we need to step up to new node sizes? If we scale horizontally do we have the capacity to support additional pods? Are we going to run into quota limits? Do we have additional connection limits we might hit downstream of the application?
Instead of needing to provision new servers with different resources, we can easily try out different configurations by changing the application manifests and redeploying to Kubernetes. Unfortunately, with all of these variables, it becomes a significant effort to keep testing and tuning the application profiles to derive the appropriate resource constraints.
With all of these variables, it becomes a significant effort to keep testing and tuning the application profiles.
Imagine the time you would spend attempting to get the perfect combination of requests and limits for that pod. And there’s no guarantee you’d find that ideal ratio of performance to cost.
When you spot a container or pod suffering from performance degradation, your initial reaction might be to max out the resources—give it 5GB of RAM and 3 cores. In theory that sounds like it might be a solid solution, but when there are so many moving parts that must interact with one another, it’s almost impossible to predict how maxing out those resources will affect adjacent pods or how it might cause your AWS, Azure, or Google Cloud bill to skyrocket.
It's All About Relationships
There’s a second order of complexity you must deal with, one that brings yet another layer to the mix—the relationship between requests and limits on the pod level. This relationship helps to define quality of service and works with three classes:
- Guaranteed
- Burstable
- Best effort

The Guaranteed class makes sure that limits = requests for every container and is the most stable, but also the most expensive and least flexible option. For a Guaranteed quality of service, if you set a request of 1GB of RAM for a container, you also set a limit of 1GB of RAM. Guaranteed is also the most rigid with regard to Kubernetes scheduling. If you know exactly how many resources your pod requires, Guaranteed is the right choice.

The Burstable class ensures that limits are greater than requests, or that limits don’t even exist. This option is good for idle pods, but may elicit noisy neighbors in a node. The idea behind noisy neighbors is that if a single pod is allowed to consume all of a node’s resources, then other pods will be starved for resources.
Burstable pods are those that are meant to be able to respond to a workload by upping their resources when necessary. For instance, let’s say you’ve deployed an eCommerce application that, during the day, sees a very low threshold of activity. However, at 5PM (when everyone leaves work), your customers start making purchases, which leads to a spike in usage. If you’ve set those resources at the baseline level, your ecommerce containers are going to start crashing. And if you set those resources at a spike level, you’ll be wasting resources 80 percent of the time. Problem is, if you have multiple burstable pods on a single node, and they request more resources at the same time, you could wind up with a throttled node and, again, things start crashing.

The Best Effort class doesn’t set either requests or limits. This is a very dangerous option for resource allocation, as it will likely result in pod eviction or resource overrun. Two things will happen with Best Effort pods. First those pods will be evicted from the node. Second, those pods will throttle all other pods in the node.
So now you have resources and relationships to consider. And remember, if Kubernetes doesn’t find a node that can match a container’s configured requests or limits, the container will not be deployed.
The complications keep piling on.

The Pitfalls of Performance Optimization in Kubernetes
It’s very easy to always go with the defaults for a container deployment. You might find a blog post that walks you through the deployment of the exact pod you’re looking to use. So you create your manifest exactly as described in the instructions and deploy it. Soon after, you realize something is wrong. The containers refuse to deploy to a node, are getting evicted, or you’re seeing unacceptable performance (or maybe a skyrocketing cloud bill).
What went wrong?
First and foremost, your use case will not be the same as the one from the article. You might have a different hosting provider, maybe you’ve had to add another container into the manifest (that wasn’t in the original tutorial), or it’s even possible that your workload demand isn’t the same as the original poster.
The thing about Kubernetes is that every use case is different…even when it’s the same. There are so many variables that affect how containers and pods perform, that you cannot count on one person’s success to be a success for you.
That’s where performance optimization comes into play. You may have downloaded that manifest for a very similar use case, but you’ll have to customize those resources to perfectly match and meet your needs.
But how? When you’re faced with numerous container manifests, each with multiple resource and limits configurations, what do you do to make this work? You certainly don’t want to deploy blind, because that’s dangerous. How do you tune your applications to perfectly match your needs?
AI to the Rescue
When you deploy a container or pod to a cluster, you have to think of so many variables. Quality of service, requests, and limits are just the tip of the iceberg but are enough to have your head spinning with confusion. But you have to be able to deploy those containers and pods with the confidence that they will function properly and serve the needs of your company. Whether you have a static or a dynamic load, you need to have the means to perfectly tailor your requirements to a solid baseline. By doing that, you’ll not only ensure those containers and pods will remain deployed, you’ll also save money.
But how? That’s the big question.
The answer lies in Artificial Intelligence (AI). AI provides an automated, systematic way of exploring these parameters for your applications and helping you make sense of the outcomes. Instead of a manual, trial-and-error effort to understand how each parameter interacts with others, and which parameters provide the optimal configuration, AI can manage this for you.
With a service like StormForge Optimize, your container manifests will be run through numerous (automated) experiments to test various combinations of parameters to come up with the best combination of resources for your containers and pods. This is done far faster and more reliably than the human brain can comprehend, so you’ll spend less time trying (in vain) to optimize those complicated deployments, and more time watching them continue to run as expected.
The StormForge machine learning model explores the parameter space and comes back with various configurations to try. StormForge has its own Kubernetes controller, which runs on your cluster to try the different configurations, each one called a trial. As the controller works, the machine learning model learns from the performance of the application.
In the end, of course, the choice of parameters for your deployment is completely up to you. You’ll be presented with various optimizations, so you can select from one that offers the best trade-offs of low latency, high throughput, resource minimization, or reliability— whichever is more important to your business. You define which metrics are important and the machine learning model will learn the optimal configuration to achieve that state.
Conclusion
If anyone ever told you Kubernetes was easy, they either had no idea what they were talking about, or were intentionally misleading you. Getting your Kubernetes application deployed in exactly the state you want or need can require tweaking a large number of parameters. On the enterprise level, that task will very quickly exceed the ability of your developers and admins, leaving your team either wasting time or putting your environment at risk. When that happens, your best bet is the Artificial Intelligence and machine learning found in StormForge.
Contact Us
Want to learn more about the StormForge platform, its powerful machine learning-driven capabilities, and how it propels teams past the challenges of running Kubernetes applications at scale? If yes, request a demo! We hope to speak with you soon.