The mistake I hear most as I speak with people adopting Kubernetes relates to how they implement the requests and limits on their pods.

The main reason that teams want to move to Kubernetes is for cost savings, but yet they use a legacy VM strategy to set the limits and requests for pods when moving to Kubernetes. Like in the VM world, they will set the CPU limits and requests of their pods to be the same and then wonder why they are not seeing the cost savings they were expecting. In some cases finding that their app is costing even more, post migration.

Adoption of Kubernetes is not the same as moving to a different type of VM with autoscaling and orchestration, it is a totally different technology stack in which the workloads will have a very different profile. When we began our journey to Kubernetes, we made the same mistake as other teams that I have spoken to, and ran into the same issues where our cost model was not proving out and we had to figure out why and how to optimize.

The challenge was workloads will have a different profile at different times, and when you have over 40,000 pods running at any given time, trying to understand the needs for each pod is not something that scales. We knew we would need Machine Learning to solve this challenge so we worked with a company called StormForge on a product called Optimize Live. Now, we have let StormForge take over the setting of all requests and limits on our pods which has led to a significant cost savings in regards to our Kubernetes spend and resulted in higher availability for our customers.

The machine learning that StormForge leverages to set these requests and limits on the pods allows our engineering teams and SRE teams to focus on our customer experience rather than infrastructure settings. The reduced toil and increased focus on our customers have been just as impactful as the dollars saved.

With software like StormForge, users have the ability to adjust the setting based on your business requirements whether it be for cost savings or for performance and availability.