StormForge Experts Discuss How Machine Learning and Rapid Experimentation Gives Developers Control of Cost and Performance Trade-offs

Most organizations are quick out of the starting gate with Kubernetes. Their developers are able to stand up Pods and Cluster, and build, test, and run their initial applications reliably. For DevOps teams, this early success inspires confidence and generates excitement about their possibilities with this technology.

That early enthusiasm fades, however, when teams start to experience Kubernetes’ complexity and its negative effects. As their deployments grow, it becomes increasingly difficult for developers to tune their apps for their desired balance of performance and cost. So, most teams hedge by over-provisioning. Then comes the nasty surprise when the cloud services bill arrives.

This challenge is holding back many teams and organizations from taking their Kubernetes deployments to a new level. But it need not be this way; there’s a new approach that removes hurdles that are blocking more Kubernetes success.

StormForge experts discussed this problem and the new approach to solving it in a recent webinar. In the session, they explained how machine learning and rapid experimentation technology gives developers the control they need to strike the desired balance between performance and cost. This article covers the high points of that discussion.


Great Promise Fuels Rapid Adoption

The popularity of the cloud-native approach, and Kubernetes in particular, have skyrocketed over the past several years. A long list of expected benefits has fueled this rapid adoption. For developers, Kubernetes offers faster, easier and more flexible ways to build, test, and deploy applications. Kubernetes also makes it easier to scale applications and to ensure that they run reliably in production.

For enterprises and other organizations, cloud Kubernetes holds the promise of lower-cost applications and infrastructure. Faster development cycles mean shorter time-to-market for products with new features and capabilities. By boosting apps’ reliability and responsiveness, Kubernetes also creates better user experiences which translate into things like improved employee productivity, increased revenues, and higher customer satisfaction.

Big Promises Bump Up Against Big Demands

Rich Bentley, StormForge’s Senior Director of Product Marketing is an expert in the business aspects of cloud-native applications in general, and Kubernetes in particular. Rich kicked off the session by ticking off some of the main reasons why Kubernetes adoption has grown so rapidly.

These include:

  • Faster time-to-market for new products, or products with new features
  • Easier and more flexible scalability (as compared to monolithic apps)
  • More reliability stemming from essentially infinite cloud computing resources, and
  • Superior user/customer experiences resulting from high app performance and reliability

This technology’s great promise has teed up very high expectations on both the technology and business sides of organizations. People are expecting big things, especially with application performance, reliability, and scalability. There’s a high bar for each of these measures, and they don’t get lowered. With performance, apps and infrastructure need to meet or exceed user’s expectations for speed and responsiveness. Reliability is a must since any downtime can be very costly from both the financial and reputational perspectives. With scalability, the first few apps ran very well, so that’s the expectation when deployments grow to dozens of apps and hundreds or thousands of users. Period, full stop.

But here’s the catch: cloud-native is complicated, and Kubernetes is no exception. It involves lots of dynamic, ephemeral workloads that are being run on rapidly assembled microservices. Once the job is done, containers, the applications they hold, and the microservices they use can go away just as quickly.

Then there’s the promise of more cost-effective operations. With just a few apps, DevOps teams can stay on top of their consumption of cloud resources. But when deployments begin scaling up in earnest, it becomes very difficult to pinpoint the specific amount of services and organization needs, so they can contract and pay for only that level of services.

Why is this so difficult?

“It’s not that there aren’t enough or the right types of controls for adjusting settings to control performance levels and cost,” said Rich Bentley. “There’s plenty of them. Kubernetes has a range of settings covering things like CPU, memory, and Replicas. And depending on the types of apps a team is running, there are often app-specific settings that can be adjusted. The problem isn’t a lack of appropriate controls, it’s actually the number of controls – and combinations thereof – that grow exponentially as the number of deployed apps grows, and how fast they need to be figured out and adjusted in order to make a difference. That’s how growing deployments stumble.”

Bentley added that “Each of the setting changes you make has a potential impact on one or more of the three key metrics for your applications: cost, performance, and reliability. Trying to understand what’s going on with an application running in a Kubernetes container, grasping the complex inter-relationships between setting options and their impact on key metrics, is extremely difficult. Doing that quickly for dozens of apps a job that, frankly, is beyond human capabilities.”

“That’s when you need to leverage machine intelligence, which is precisely what we do at StormForge. With our solutions, we enable developers and DevOps teams to make informed decisions about choices and settings. We also equip them to make those decisions fast enough to make a difference. That means making the right decisions and making them in the right ways so that they drive the best outcomes.”

Bentley underscored the need for new controls by sharing some recent market data on the widespread over-provisioning of containerized applications. One recent study from DataDog. showed that most containers running in production only use a small percentage of their requested CPU and memory. In fact, according to the study, respondents said that just under half (49%) of their containers use less than 30% of their requested CPU resources, and 45% of then use under 30% of their requested memory.

Multiplying that level of unused resources by the number of the organizations shifting their operations to the cloud makes it clear that the amount of cloud waste now occurring is enormous, and its financial impact is staggering. Gartner is now tracking this waste, and pegs this year’s global total at around $17.6 billion.

It’s not just money being wasted by this over-provisioning phenomenon. It’s energy, too – all the electricity, cooling and ventilation needed to run the cloud service providers’ vast server farms. These data centers pollute the environment, generating as much greenhouse gas emission as the airline industry does on an annual basis.

It’s obvious that this amount of waste is untenable and needs to be curtailed. Giving engineers and developers tools that will enable them to make better, smarter resourcing decisions is a big step toward bringing more efficiency and environmental friendliness to the cloud.

Rich Bentley then briefly touched on the human side of these challenges. He cited statistics that showed the high rate of burn-out being experienced by engineers and developers who are building cloud apps. Much of that job dissatisfaction and burn-out stems from this problem of endlessly chasing better configuration settings. For humans, it’s a mundane, repetitive, and unrewarding task – one that is devoid of the challenges and opportunities for creativity that techies thrive on.

He also quickly discussed why popular tools and strategies don’t quite get developers and their teams to where they need to be:

  • Trial-and-error: This may work during early phases of deployment but quickly becomes overwhelming when things scale up.
  • Performance & load testing: This provides useful and important functionality – showing how an app will perform under load. But these systems do not tell users what to do about it.
  • Kubernetes Autoscaler: The Horizontal Pod Autoscaler that is part of Kubernetes helps teams to automatically scale their Pods. But it has its own settings (added to all the others), and it offers only limited help in dealing with application-specific settings.
  • Monitoring & observability tools: These are great for alerting teams about performance problems with resources running in production. The problem is that they are reactive; by the time the alert comes into the NOC, the damage usually has already been done.

Rich closed out his sections of the session by summarizing what StormForge is doing. “Trying to deal with these challenges reactively simply isn’t a viable strategy. Plus, it’s thankless drudgery. StormForge equips developers and DevOps teams to address these challenges upfront – proactively. And a big way we do that is with automated application optimization.” 

Following Rich was Brad Ascar, Senior Solutions Architect for StormForge. With decades of technology experience and in-depth knowledge of everything Kubernetes, Rich works directly with StormForge customers to not only ensure that their deployments are successful, but also to maximize the business value of their investment in StormForge solutions. 

Brad provides an overview of the StormForge Platform and its main features and capabilities. He then launched into a live demo in which he walked through those features and provided several detailed use cases. 

A written recap of Brad’s demo really wouldn’t do it justice. The good news is that a recorded version of it is available on-demand. For anyone wanting more detailed info on the StormForge Platform and our approach to automated application optimization, we highly recommend it.

There’s a better, smarter, and more automated way to optimize the performance and efficiency of your Kubernetes application while eliminating their hidden costs. StormForge can show you the way. Are you ready?