After a lot of hard work, we’ve finally deployed our Kubernetes cluster, and our application is live. But our work isn’t over yet! We have a few important concerns to address in the immediate post-launch period, often called Day 2 concerns, to ensure that our infrastructure is resilient and gives our users a great experience.

As we address each concern, we should also be asking ourselves: are we confident we’re only paying for what we need?

Let’s look at how to monitor, scale, and optimize our clusters so they’re healthy and efficient in the post-deployment period and throughout the rest of their life cycles. We’ll go over some common challenges we face immediately after deployment, then discuss why observability helps address these concerns. Then, we’ll explore how we can use the data our clusters provide before considering how we can go beyond individual actionable insights to make all our data insightful.

404 Confused Character

Chapter 1: Day 2 Challenges

We’d like to know that our clusters can comfortably handle our applications as they mature and scale. We’d also like to know that we’re prepared for sudden success — watching a cluster limp off the starting line in the face of unprecedented user traffic is a painful missed opportunity. So, we tend to over-prepare and over-provision, a costly and inefficient exercise in maintaining unused resources.

Our first concern should be ensuring we’ve prepared our infrastructure to scale as efficiently as possible.


An application’s user traffic is often inconsistent after it goes live, so our cluster workloads are likewise unpredictable. In addition to performing poorly — or potentially not at all — an inefficient Kubernetes cluster can become very costly very quickly.

Kubernetes follows strict rules to manage container memory and CPU time. Out-of-memory (OOM) errors kill our containers, and CPU-starved containers get throttled.

Under-provisioning our cluster causes our release to fail in one of two primary ways. If we’re hosting our cluster on-premises, adding capacity to meet spikes in demand is a challenge, so we risk losing customers as they struggle with poor performance. If we’re using a managed Kubernetes service, we face a grim choice: sit out the peak in demand and hope our customers are patient or hope the bill for capacity bursting from our provider is manageable.

Under-provisioning causes performance and availability risk.

Miscalculating in the other direction isn’t much better. Whether we host it ourselves or rely on a provider, a cluster using just a portion of its capacity costs just as much as a cluster allocating all its resources.

Over-provisioning is costly and wastes cloud resources.

More concerningly, it can be difficult to get an accurate view of the state of our cluster. Unlike traditional, monolithic systems, a Kubernetes cluster often contains components and systems that generate data to different standards or use different protocols. Many of these components are continuously destroyed and replaced, so data in our analyses might not necessarily reflect our cluster’s current state.

Especially because Kubernetes provides so many opportunities to make ephemeral changes, it’s important for the long-term health of our clusters to make sure that we can account for inevitable configuration drift.

Configuration Drift

Configuration drift occurs when the true state of our infrastructure deviates from the configurations we defined. This usually happens when we make undocumented changes to our infrastructure.

There are some valid reasons why we might skip best practices. For example, it’s not always practical to wait for approval when providing temporary access or adding emergency capacity. There are also more than a few improper reasons that, realistically, won’t surprise engineers with any amount of experience in a production environment.

The truth is that configuration drift is unavoidable in any infrastructure. If we don’t address it at some point, the steady accumulation of inefficiencies and errors is almost guaranteed to snowball into a pattern of serious incidents.

Critically, it’s a self-concealing problem, burying the root causes of glitches in a haystack of loose ends and creating security blind spots that make our app more vulnerable to bad actors.

Chapter 2: Observability: An Important First Step

So, we have an idea of how we expect our cluster to perform on day 2, and we understand that the metrics we relied upon in development are only a starting point. Together, these challenges demand that we make good observability a central praxis in our deployments.

Observability makes the health and performance of our cluster more available for analysis by inferring how those outputs correlate to internal states. Good observability gives us a foundation with which to build a scalable, lean, performant system. The right metrics at the right time provide the foundation to optimize and ensure every component has reasonable headroom without incurring unnecessary costs.

Furthermore, it makes the debugging process much simpler. We can trace problems to their sources and understand their total impact on the cluster’s performance, making the haystack transparent and pointing right to the needle.Getting handed the right metrics gives us a view of what is happening to the cluster at high and low levels and makes it easy to proactively identify vulnerable containers or suspicious traffic.

For our observability to be actionable, however, our engineers need to know what to monitor, where to observe it, and how to set up alerts and respond to them. In short, it’s not enough to know our system’s internal state — we must know what to do about it.

Chapter 3: Beyond Observability

A Kubernetes cluster generates a massive amount of data. This data can be overwhelming if we don’t know what to look for. We can set alerts for interruptions to basic operations or imminent problems and still get lost in the haystack.

Doing anything meaningful with all this data requires a way to analyze, correlate, and contextualize it to figure out how we can best act on it.

To move toward effective actionability, we need an efficient and robust monitoring solution that detects unusual events and sends alerts in real-time. It should also help us make sense of past performance by detecting important trends for us to investigate further. For example, it might correlate a pattern of failed connections with user traffic from a specific geographic region.

As a first step, we can implement some of these strategies ourselves using integrated tools in Kubernetes.

Scaling Appropriately

We should pay attention to specific metrics that directly correlate to the states of pods and the cluster as a whole. Kubernetes provides some simple tools to optimize the cost and performance of our cluster as it scales, but they aren’t all well-suited for generating actionable data.

For example, the kube-state metrics (KSM) service role generates metrics related to deployments, nodes, and pods by listening to the Kubernetes API server. Still, it doesn’t provide more than general guidelines for resource allocation based on the cluster’s number of nodes and running pods.

We can get more useful observability data from the containers themselves, which generate application logs, and from the metrics API, which provides additional cluster resource metrics. For example, we can directly measure the compute resources allocated by our nodes to better judge how to adjust the quantity or size of nodes in our cluster. Or, we can compare container resource consumption against our requests, limits, and quotas to determine how much headroom we have.

Application-specific metrics are also crucial. The number of users accessing the app and any specific conversion metrics are important in understanding user traffic spikes and their impact on resource utilization — and each is helpful for determining the maximum number of replicas to maintain in our Horizontal Pod Autoscaler, or HPA.

The Horizontal Pod Autoscaler

We can adjust resource allocation to match demand by scaling our cluster either horizontally or vertically:

  • We scale horizontally by adjusting the number of pods to match our workload to demand.
  • To scale vertically, we adjust resource allocation for new pods or recreate existing pods with the updated allocations. To update running pods without restarting them, we’ll need to wait for this in-development functionality to release.

Horizontal scaling is flexible and easy to automate, and it’s a natural first step toward minimizing resource waste. Kubernetes automates resource usage monitoring and applies changes using the HorizontalPodAutoScaler (HPA) controller.

Our cluster engineer defines target resource levels and specifies the minimum and maximum numbers of replicas the HPA should maintain to meet demand. We can also optionally provide custom metrics for it to process, such as network traffic.

The HPA is implemented as a control loop running at discrete intervals. Once during each interval, the controller compares resource metrics against our defined targets.

Implementing a HorizontalPodAutoScaler is simple, requiring only a single line:

kubectl autoscale deployment hpa-demo-deployment --cpu-percent=50 --min=1 --max=10

This HPA maintains a minimum of 1 replica and a maximum of 10 replicas. If CPU usage exceeds 50%, it will deploy more pods and ensure they are not using more than 50% of their CPU. When the workload decreases, the number of running pods will decrease accordingly.

HPA can also set targets based on the resource usage of a specific container type across a set of pods. This lets us scale based on how important an application is. For example, we can prioritize autoscaling around the needs of an application handling transactions to keep it performant but allow the kube-scheduler to delay scheduling instances of an application that handles non-essential computations.

HPA Limitations

While the HPA is an important step toward matching resource allocation to demand, it’s important to recognize its limitations as well.

  • HPA is only as good as how it’s configured. As we’ve seen, deploying an HPA is simple but it requires you to specify the target utilization, which determines when pods are added or removed. Determining the best target utilization is not intuitive or obvious, and choosing the wrong configuration will result in waste or risk.
  • If your pods aren’t right-sized, then HPA is just replicating inefficient pods. The HPA will add and remove pods, but if those pods aren’t configured to run efficiently then we’re just multiplying that inefficiency every time we add a pod. Vertical autoscaling is important for addressing this.
  • HPA and VPA can’t be used together by default. But, unfortunately the HPA and VPA both scale based on the same metric (CPU utilization) by default. This results in thrashing, where one autoscaler is adding resources to maintain a particular target utilization while the other autoscaler is removing resources to do the same. You can scale based on custom metrics, but this adds another layer of complexity and uncertainty.

Managing Configuration Drift

Observability tools also provide an important step towards managing the configuration drift associated with scaling. Manually tracking every change and its potential implications for the rest of our infrastructure is a difficult and error-prone task, so we need tools that can track versioning and dependencies for our software and the behaviors of ingress controllers. They should also manage certificates and DNS concerns.

We can also prevent drift by creating a configuration drift baseline and setting policies in an exclusion list that defines which resources we can apply a configuration to. The list should also contain policies describing how configurations can be altered to ensure their uniform and trackable application across the cluster.

We can verify that we are following security best practices by using Kubescape. This open-source tool is effective in risk analysis, RBAC visualization, and image vulnerability scanning.

Cost Visibility

So, we need a way to quickly discern whether we’re incurring costs in line with our pre-deployment projections. Although we could assign a DevOps engineer to monitor everything manually, we’ll be better off using a cloud cost management tool, which analyzes a Kubernetes cluster and recommends adjustments for cost reduction. Some observability tools, like Datadog, have built cloud cost management into their platform. Standalone cloud cost management tools are also available.

If we use a managed Kubernetes service, we should also ensure we use the most suitable offering for our organization’s needs. Our cluster operator should understand the cost and performance implications of any potential adjustment to get a rough idea of how different scaling scenarios might play out. We can approximate some of these changes using the cost estimation tools many providers provide.

An excellent monitoring tool to start with is Prometheus. It is an open-source tool that integrates easily into any Kubernetes cluster. The best way to use it is to combine it with the open-source data visualization platform Grafana.

Prometheus service discovery collects data from pods, nodes, and services metrics. It retrieves these metrics over HTTP, so we must expose the cluster metrics on an endpoint. Its Kubernetes monitoring stack includes a few important components that help us treat our data more insightfully:

  •     Prometheus server
  •     Grafana
  •     Alert manager

ML-Based Analysis of Observability Data

Observability and cost management tools are mainly focused on providing visibility. They can show us what’s happening in our environment and help to identify and prioritize opportunities for improvement. But what to do with that information is left to you. And with mountains of data to sift through, those actions are not intuitive or obvious.

This is where machine learning excels – analyzing large amounts of data, understanding which variables have the greatest impact on outcomes like resource usage and performance, and recommending updates to those variables to improve the outcomes.

With machine learning, we can make our autoscaling work better by making intelligent decisions on when to scale horizontally and vertically based on actual usage data.

Chapter 4: StormForge for Actionability and Efficiency

StormForge Optimize Live is built for optimizing Kubernetes clusters in a production environment. It uses machine learning to analyze observability data and recommend configurations that result in peak performance and efficiency.

To start working with Optimize Live requires just a few simple steps.

  1. We configure Optimize Live by selecting an application, and then connecting it to our observability tools. We can define additional parameters at this point, such as the level of risk tolerance and the frequency of recommendations.
  2. The StormForge machine learning algorithm then takes over, analyzing actual usage data from your observability solution.
  3. At the frequency specified during configuration, StormForge then recommends updates to pod CPU and memory (in other words, vertical sizing) and also target utilization for your HPA if one is detected.
  4. Recommendations can be implemented automatically or manually deployed.
  5. This cycle continues. Essentially Optimize Live lets you put your scaling on autopilot to keep things running efficiently on a continuous basis.

StormForge Optimize Live also offers a Grafana dashboard to help visualize our optimization efforts by outlining changes in performance and cost.

The StormForge platform works with any certified Kubernetes distribution.


A well-architected Kubernetes deployment can provide a fluid, seamless user experience on a manageable budget. However, achieving a healthy and efficient Kubernetes cluster requires strict attention to how our cluster’s components interact.

We should plan for strong observability practices immediately after or even during cluster deployment. The data it provides is indispensable for visibility related to security, scaling, and optimization. We also need to be proactive in our approach to scaling. For example, it’s best not to wait until we experience a spike in user traffic to start evaluating scaling strategies.

We can use open-source observability tools like Prometheus or those provided by Kubernetes, but we’ll need more specialized, intelligent help to guarantee efficiency along with peak performance and a great user experience. StormForge Optimize Live offers a powerful tool for optimizing cluster efficiency and performance. It applies machine learning to recommend configurations that help our engineers navigate the massive volume of data produced by Kubernetes and focus on strategic, business-level objectives.

Contact StormForge today to learn more about how to plan for a highly scalable day 2 Kubernetes deployment for your app.

Explore StormForge

Discover how StormForge can help automatically improve application performance and cost efficiency in cloud native environments. 

See a DemoStart Trial