Boone Pickens once said, “Predictability can lead to failure,” and while he wasn’t an SRE, he may have actually been on to something.
When it comes to performance, we want predictability. We want consistency and to know what we can expect…every…single…time. What don’t we want? Surprises. As an SRE, I wanted to be able to predict how my applications were going to perform, even if it was impossible to predict customer behavior. Predictability leads to success – or at least high performance – right? Well…not always.
Once apps were in production, all the performance testing in the world couldn’t guarantee what would happen. And, of course, eventually, something would happen in production that we didn’t account for.
That’s why, when it comes to optimizing your Kubernetes environment, you need to be able to do so in both pre-production and production. But how are you going to do that? Sure, you could throw a ton of resources at the problem and hope that manual tuning will result in the most optimized configuration, but that’s just not humanly possible. Resources are finite. Time is finite. And the tuning variable configurations are almost infinite.
So, typically what happens is that engineers will only consider variables that they think will have an impact. They rely on past experience or go with the default recommendations. Or, more commonly, they over-provision resources. Obviously not the ideal solution for the budget, but it seems like a pretty safe bet. In essence, engineers predict what will optimize performance because they can’t possibly calibrate every possible variable to know for sure.
And this is when predictability leads to failure.
This approach fails to truly optimize resources, to verify that the tuning will actually make things better (and not worse), or to take the most cost-efficient approach to get the most desirable result. (This is when T. Boone Pickens pats himself on the back for his sage wisdom).
At StormForge, we’ve witnessed this predicament first-hand many times. We’ve worked with customers who questioned the configuration recommendations that came from our ML-driven insights. At first they couldn’t believe that implementing the recommended changes would have any impact on performance. But they trusted us, and went ahead and made the changes. And guess what? It dramatically improved performance – by 40% or more – and they had a better-tuned application as a result.
By leveraging machine learning, the results that StormForge provides are often surprising – but in a good way! Machine learning doesn’t just look for configuration options that will work; it also looks for (and denies) the options that will fail. Because the nice thing about optimization is that you learn just as much from the failed options as you do from the ones that work. That’s why StormForge can guarantee that our recommendations will deploy successfully – our machine learning leaves nothing unexplored.
With StormForge Optimize Pro, customers get proactive optimization in pre-production with deep application insights. Optimize Live then turns observability into actionability by empowering you to make adjustments in production based on intelligent configuration recommendations.
Yes, predictability matters for application performance. But predictability can also lead to failure when the “predictable” approach to optimization isn’t the best one to generate the best results. That’s why StormForge automates the optimization process and leverages machine learning to achieve and maintain Kubernetes resource efficiency at scale.
Get Started with StormForge
Try StormForge for FREE, and start optimizing your Kubernetes environment now.
Download our latest eBook
About the Author
Patrick Bergstrom
Patrick Bergstrom is Chief Technology Officer at StormForge, where he is responsible for product strategy development, and delivering innovation to StormForge customers.
Bergstrom was most recently Vice President, Site Reliability Engineering & Software Engineering, Enterprise Operations at UnitedHealth Group, where he created a globally distributed organization responsible for processes and tools to support distributed applications using modern DevOps techniques and best practices. He also led the creation of the Site Reliability Engineering category in BestBuy.com’s Web Operations group and introduced modern strategies around Data Collection, Application Monitoring, Alerting, Incident Management and Response. Bergstrom started his career in the Army National Guard as an Avionic Systems Technician, where he gained his first insight into system reliability at scale while working on U.S Army airframes during his 12 years of service.
Based in Ames, Iowa where he lives with his wife and enjoys woodworking, and also serves as a Board Chair of the Economic Vitality Committee as part of Downtown Ames & Ames Chamber of Commerce. He holds a Bachelor’s Degree from Iowa State University.