Introduction

Your developers and administrators spend untold hours deploying applications and services to a Kubernetes cluster capable of scaling to meet an ever-growing demand. And from a certain perspective, it seems like those clustered apps are helping to keep your business running smoother than ever. No matter how much demand is placed on the cluster, things just keep humming.

Until they don’t. Out of nowhere that mission-critical app has ceased to function properly and your pipeline is no longer delivering as it should.

What happened? Your developers know what they’re doing and their container/pod manifests were almost poetic in their technological beauty. Certainly the failure didn’t stem from their configurations. Or did it?

One problem that is more widespread than you might think is that many organizations, while having evolved their software development and deployment practices to be agile, automated, and continuous, are still managing performance the same way they did for their traditional software. We call this the Kubernetes Application Performance Gap.

For example, your admins can roll out an Apache-based web site and (other than adding regular content) that site will function pretty well on its own. Every so often you might find yourself needing to tweak something here and there, but for the most part it’s as close to a “set it and forget it” server application as you can get.

For performance, you may have run some load testing before deployment, but because the service is largely static, anything more is probably overkill.

That’s the traditional method of rolling out a service.

For one of their customer sites, a SaaS platform provider identified:

82%

LATENCY IMPROVEMENT

57%

HOSTING COST REDUCTION

Kubernetes is a different beast altogether. Your team might craft the perfect manifest which deploys a highly functional and seamless container/pod application or service, and you might be inclined to assume it will behave properly all on its own. You test it early and then fall into an “out of sight, out of mind” state of being. You assume that deployment no longer needs performance testing.

That’s not the case. Why? As your container deployments get more and more complicated, they contain a larger number of moving parts and an exceedingly higher amount of configurations. And unless those configurations are absolutely perfect, things can (and do) go wrong. And, because we’re now deploying more frequently using a CI/CD pipeline approach, those containers will require much more rigorous and regular optimization and testing.

That’s why you must consider proactive and continuous performance management a priority for your Kubernetes cluster.

The Application Performance Gap has a real impact on the business. A few of the consequences include:

  • Dev teams spend too much time reacting to issues, and not enough time innovating, causing your business to lag behind the competition.
  • Fire drills, war rooms, and finger pointing are common, wasting IT team productivity, causing burnout and affecting morale.
  • Poor service delivery causes lost customers, lost revenues, and a negative impact to your company’s reputation.

Lag Behind Competition

Employee Burnout

Lost Business

The 3 Steps to Better Kubernetes Performance Management

Closing the Application Performance Gap for Kubernetes may seem daunting, but never fear – we’ve come up with a gradual, three-step approach. Those steps are:

1. Test It

Make performance testing a regular part of your release process.

2. Optimize It

Optimize your applications for performance, stability, and cost-efficiency.

3. Automate it

Automate performance testing and optimization to drive continuous optimization.

For each of these steps, you want to consider:

People

Those that make it happen.

Process

The thing that’s happening.

Technology

The tools people use to make the process happen.

Let’s break down these steps, one by one, so you have a better understanding of how to get proactive with your Kubernetes performance management.

Step 1

Test it.

Make performance testing a regular part of your release process

In the drive to become more agile, most enterprise businesses have adopted a number of new technologies and processes. That means Kubernetes, Docker, CI/CD, automation, and other new ways of working. These new(ish) technological wonders help businesses achieve DevOps—that nirvana where developers and operations come together to make things happen on a scale and at a frequency that couldn’t previously happen. To help your company reach that goal, you need to make performance testing a part of the puzzle.

What is performance testing? Part of the quality assurance process, performance testing is a way to test how a system performs with regard to responsiveness and stability under load. In other words, do your containers and pods function as expected when you put them under ever-increasing workloads?

Performance Testing is a way to test how a system performs with regard to responsiveness and stability under load.

As it stands, your company probably does some testing on those apps and services deployed to the Kubernetes cluster. However, the current state of testing might not be capable of achieving the agility you’re looking for. Why?

  • In your quest to move to an agile DevOps approach for deployment, and the delivery of new capabilities, you have yet to evolve the process in a way that can manage the levels of desired performance.
  • Your team probably doesn’t do regular and frequent performance testing.
  • Your performance testing efforts are owned by a separate QA or ops team, instead of being owned by the development team. This runs counter to the DevOps philosophy of “You build it you run it.”
  • All of your performance testing is created and run manually.
  • You primarily use a reactive approach to performance testing (such as relying on Application Performance Management software), which, by design, puts you behind the curve.
  • Infrequent performance testing cannot keep up with how quickly your deployments are evolving.

So, how do you take your current state of performance testing to the next level? You must make performance testing a regular part of your deployment release process. How do you do that?

  • Increase the frequency of performance testing to match your deployment frequency – weekly, or even daily.
  • Shift performance testing left so that it begins much earlier in the software development cycle.
  • Leverage Performance Testing as Code to automate execution of performance tests.
  • Integrate performance testing into your CI/CD workflow and make it a quality gate that must be passed before releasing.
  • Migrate performance testing to your dev team. By doing this, you extend the DevOps “You build it, you run it” mantra to “You build it, you own performance.”

Aspects to consider

  • People: DevOps culture, dev ownership, QA or ops team become enablers, i.e. They become responsible for building the self-service capabilities that empower dev teams to incorporate performance testing.
  • Process: Move toward agile performance testing approaches to match DevOps and agile development practices.
  • Technology: Adopt automated testing capabilities and Performance Testing as Code so you can integrate testing into your CI/CD process.

StormForge Performance Testing can make this process considerably easier for you.

Step 2

Optimize it.

Optimize your applications for performance, stability, and cost-efficiency

Now that you’re doing regular performance testing, what’s the next step?

Optimization.

Why? Because while performance testing will tell you how your application will perform under load, it can’t tell you what to do if there is an issue. Stopping at performance testing alone is a costly mistake, as you would fail to gain actionable insights for your cloud-native applications that could:

  • Prevent downtime.
  • Improve performance.
  • Save on cloud hosting costs.

Maybe everything works fine, but it turns out you’ve drastically over-provisioned and cloud costs are 50% higher than originally budgeted.

The idea behind optimization stems from the fact that there are myriad ways to deploy your application. Every app has a number of interrelated configuration settings. For apps running on Kubernetes, two of the most important are:

Requests

Definitions for the minimum amount of a particular resource (e.g. CPU and memory) that application will need to function properly.

Limits

The maximum amount of a resource that your app is allowed to consume.

There are other, application-specific settings as well – JVM heap size, garbage collection settings, thread management – the list goes on and on.

Each of these settings affects how well the application runs, how stable it is, and how much it costs to run. If you do the math, the number of possible configurations approaches infinity very quickly.

For example, your developers have created and tested an app to be deployed to a Kubernetes cluster. They provide their best guess at all the configuration settings discussed above. Prior to deployment, everything tests out. The app is deployed to the cluster without issue. You have liftoff. At this point the developers forget about the app and move on to their next challenge.

As that application runs early in its lifecycle, everything is going as planned. Not surprising, because you’ve tested it and the result bore out in production.

But somewhere along the line, something goes wrong.

Customers and clients start hitting that application or service more and more frequently until something breaks. In the scaling, one of the components goes off the rails and the pod fails. Or maybe everything works fine, but it turns out you’ve drastically over-provisioned and cloud costs are 50% higher than originally budgeted.

Why did this happen?

While you may be able to identify and fix the cause of a particular issue, the systemic root cause is harder to get at. With the almost infinite number of configuration settings described earlier, configuring your application for optimal performance, stability, and cost-efficiency becomes incredibly complex. Humans don’t have the mental capacity to predict the myriad outcomes associated with all of these parameters, a situation made even more complex as demand grows and your system scales.

To address the root cause, you must add optimization to your pre-deployment arsenal. But how? It’s clear with the near-infinite possible settings that trial-and-error tuning will be incredibly time-consuming and ineffectual. Luckily, this is just the type of problem that machine learning excels at solving.

Machine learning applies a scientific approach to arriving at the best settings for balancing performance, cost, and stability for your application. It considers your goals (as measured by the performance testing we put in place during step one), tries different configurations, and intelligently homes in on the optimal parameters for meeting those goals. A “Rapid Experimentation Engine” combines machine learning generated trials with performance testing to optimize much more efficiently than a human could do on his or her own.

With optimization, you can now release more confidently and reduce the risks of poor performance, downtime, or costly over-provisioning.

Aspects to consider

  • People: Developers must understand the importance of optimization, beyond basing a deployment on configurations set by someone else. Application owners must establish and clearly communicate business goals.
  • Process: Optimization incorporated as part of the pre-deployment process.
  • Technology: A “Rapid Experimentation Engine” that combines machine-learning powered optimization with performance testing.

StormForge Optimization can help make this a reality.

Step 3

Automate it.

Automate performance testing and optimization to drive continuous optimization

At this point you’ve realized, in order to proactively manage Kubernetes testing and optimization, automation is going to have to play a key part. To do this, you must put everything together to combine the automated performance testing of Step 1 and the optimization of Step 2 to create an automatic, continuous optimization process that is built into your CI/CD workflow.

With your dynamic, constantly changing Kubernetes environment, it’s not enough to test and optimize one time, or even regularly but infrequently. Just as you are building continuous integration and continuous delivery (CI/CD) pipelines, testing and optimization also need to be continuous processes.

When turning performance testing and optimization into a continuous, automatic process, here are a few important things to keep in mind:

Start with the end in mind.

Establish SLAs that are aligned to business goals and design testing and optimization scenarios around these.

Test often.

With the increased frequency of releases that accompany the move to a DevOps culture, it’s critical that performance testing occurs frequently, ideally with every release.

Test early.

Incorporate performance testing as early as possible in the software lifecycle. According to The Standish Group, for every stage you delay in finding and fixing an issue, it will cost 10 times more in terms of time and money.

Computer generated graphic of a dial with the needle at max, and a screen in front of it showing a moving graph

Emulate the real-world.

As a starting point for your performance tests, you want to simulate real-world production traffic as closely as possible. Consider performance testing tools built on an open workload model to more closely emulate real-world traffic. There are also tools available that record and replay production traffic for use in performance testing.

But remember, performance testing alone isn’t enough. At this stage, it’s also about optimizing apps in a continuous manner as well. Performance testing and optimization work hand-in-hand as part of a continuous optimization process. Both parts of the process can be automated and incorporated into your release workflow to ensure optimal performance, reliability and cost-efficiency before apps are deployed.

Aspects to consider

  • People: Extend the concept of “you build it, you run it” to include performance. If you build it, you own its performance. Eliminate siloed QA teams and bring them together with development.
  • Process: Build continuous performance testing and optimization into your CI/CD process as a quality gate. Establish performance-based SLAs aligned with business goals to use as testing criteria.
  • Technology: Automated, machine learning-powered performance testing and optimization integrated into your DevOps toolchain. Use record and replay tools to capture real production traffic as a starting point for performance tests.

StormForge can help you build performance testing and optimization into your CI/CD workflow.

Conclusion

Performance testing and optimization can help deliver flawless app performance without over-provisioning and less trial-and-error. By adopting a performance culture, or thinking about performance as a systematic, proactive process, you’ll wind up with deployments better capable of handling the rise and fall of demand and do so with an unheard of reliability.

The value of moving from reactive to proactive, automated performance management is clear:

  • Faster innovation, as development efforts become focused on building and delivering new capabilities instead of troubleshooting performance issues after they’ve reached production.
  • Improved team productivity, greater job satisfaction, less burn-out and less turnover as teams are empowered to fully own their applications throughout the lifecycle.
  • Reduced business risk resulting from failed deployments and poorly performing applications.
  • Reduced cloud costs through more efficient use of resources and elimination of over-provisioning.
  • Improved user experience, resulting in improvements to customer satisfaction, brand perception, and revenues.

By using StormForge, a large online travel company identified:

50%

Cost-Efficiency Improvements

To make this a reality, reach out to StormForge and find out how to start closing the Application Performance Gap with automated testing and optimization.