Software design patterns were popularized in the 1990s by the authors of the influential computer science book Design Patterns: Elements of Reusable Object-Oriented Software. Although the book focuses on software development, design patterns can be used to address many IT engineering challenges, including designing Kubernetes infrastructures.

So what are design patterns? A design pattern is an established, general solution to a common problem. Design patterns evolve from the collective wisdom of experienced practitioners in a given field and provide a template for best practices.

Antipatterns can be thought of as the opposite of design patterns: they are common pitfalls that initially appear to be good solutions but prove to be ineffective and are often counterproductive.

This paper describes two patterns and three antipatterns that apply to scalability in Kubernetes environments. To learn about additional Kubernetes design patterns, consider reading Kubernetes Patterns by Bilgin Ibryam and Roland Huß. To explore more common antipatterns, consider reading Optimizing Java: Practical Techniques for Improving JVM Application Performance by Benjamin J Evans, James Gough, and Chris Newland.

Patterns for Scalability

Among the biggest architectural advantages of Kubernetes is the ability to run your applications at scale to meet dynamic business needs. In practice, however, scaling applications in Kubernetes can be challenging for a couple of reasons:

  • There is inherent resource contention in a multi-tenant environment. Bilgin Ibryam and Roland Huß call out this challenge in their book: “…in an environment with shared resources among a large number of processes with different priorities, the only way for a successful coexistence is to know the demands of every process in advance.”
  • There are different aspects of application scaling within a Kubernetes environment. This requires addressing the application’s runtime configuration, scaling the application pod vertically and horizontally, scaling the underlying cluster as appropriate, and knowing when to do which tasks — and in what combination.

Two design patterns have emerged to address these two challenges, respectively: Predictable Demands and Elastic Scaling.

Predictable Demands

The Predictable Demands pattern dictates that containerized apps and services declare their resource requirements and dependencies at runtime. This means that each running application’s pods must be configured with accurate runtime dependencies, resource profiles, and — possibly — pod priorities. Doing so enables Kubernetes to efficiently schedule pods across nodes and ensure that every application has the resources it needs to run effectively.

  • Runtime dependencies are resources such as persistent storage, secrets, and configmaps. The details about these types of resources tend to be readily known as they are uncovered through the natural design and development phases of the application. As a result, explicitly specifying these runtime dependencies is a straightforward configuration task.
  • Resource profiles define the requests and limits for CPU, memory, and ephemeral storage of a pod. Custom extended resources can be defined as well. (For more information on these extended resources, please refer to the Kubernetes extended resources documentation.) While CPU is a compressible resource (usage attempts that exceed specified limits will simply be throttled), memory and ephemeral storage are not. Usage attempts that exceed specified limits for incompressible resources will result in process termination and/or pod eviction, leading to outages. A trusted method for establishing the correct request and limit amounts is therefore crucial, not just for performance and scalability but also for uptime.
  • Pod priorities determine each pod’s relative importance and, consequently, the order Kubernetes will use to schedule the pods. In addition, if Kubernetes is unable to schedule a high-priority pod, it will begin evicting lower-priority pods to free up resources. Improper prioritization can lead to unpredictable behavior across multiple applications.

Unlike runtime dependencies, neither resource specs nor pod priorities are configuration parameters discovered during the design and development phases of an application. They are only discovered by observing utilization under regular usage loads or through experimentation (trial and error) of various usage scenarios.

A word of caution! An attempt to shortcut this process is a path to one or more antipatterns discussed below and can create future headaches.

Elastic Scaling

Most Kubernetes veterans address application scalability using built-in vertical and/or horizontal pod scaling. However, there are additional aspects to consider when scaling our apps. If we stop with just these two techniques, we risk falling into the Missing the Bigger Picture antipattern.

The Elastic Scaling pattern dictates that our apps and microservices be configured to run as efficiently as possible within their pods so that Kubernetes can effectively respond to fluctuating demand across all apps in the cluster. It does this by automatically adding or removing these optimized application pods and cluster resources. The pattern provides a comprehensive approach to scalability that includes more than just vertical and horizontal pod scaling:

  • Application Tuning — tuning the runtime settings (threads, heap, JIT, etc.) within the pod
  • Vertical Pod Scaling — increasing/decreasing resources of the pod itself
  • Horizontal Pod Scaling — adding or removing pods
  • Cluster Scaling — adding or removing cluster nodes

Application Tuning

Adding more compute resources is the most obvious way to scale, but if we don’t first tune what’s running in the pod, we will scale wasted resources along with the application. This approach is inefficient, erodes our cluster capacity, and causes us to pay for more infrastructure than we need. Application tuning resolves this problem by tailoring the resources running inside the pod to meet the specific demands faced by the app in production scenarios.

Application tuning is the most granular scaling step and a crucial first step in the Elastic Scaling pattern. It also provides the most efficiency benefits. Unfortunately, it is also the most frequently bypassed scaling step. This is because when it is done properly, it’s a tedious task. It relies on experiments, extensive testing, and a lot of tinkering.

Application tuning is typically a manual chore with a lot of guesswork. This tedium and uncertainty are what typically lead us down the path toward nearly all the antipatterns that we will cover shortly. The good news is that application tuning does not have to be a tiresome, manual process — there is a better way!

With the advent of machine learning (ML)-based tooling and automation, tedium and uncertainty are replaced with quickly gathered data points that guide us to the optimal tuning configurations. We’ll discuss more about this below.

Vertical Pod Scaling

Once the application and its runtime have been properly tuned, the next step is to configure and manage the resource requests and limits allocated for the application’s pod(s) while they are running. The Vertical Pod Autoscaler (VPA) is usually investigated to handle this aspect. We say investigated because we usually discover that:

  • Scaling pods vertically requires our pods to be restarted (though work is underway in the Kubernetes community to allow live vertical pod scaling), a scenario that requires proper planning for our stateful apps.
  • Recommendations made by the VPA don’t provide enough efficiency gains to justify the inconvenience of restarting pods.
  • The VPA and the HPA do not usually work well together. VPA and HPA scaling are triggered by the same resources, which can cause race conditions between vertical and horizontal scaling events.

Once these VPA nuances are uncovered, the tendency is to favor HPA over VPA. This is unfortunate, as there are likely to be cases where scaling up is better than scaling out. What if ML could be used to help us address these existing shortcomings? Read on to learn more.

Horizontal Pod Scaling

The Horizontal Pod Autoscaler (HPA) was designed to automatically add or remove our application pods in response to changes in load. The HPA compares pod resource utilization against the scaling thresholds we specify for it, using either built-in standard metrics — like CPU and memory usage — or custom metrics. It then scales our pod replica counts accordingly.

However, the HPA isn’t a foolproof, single scaling solution:

  • It still relies on us to manually set the definitions by which it operates.
  • Once the operating definition is set, it does not adjust to ongoing load changes.
  • If we do not specify a maximum pod count, the HPA will scale an application until all cluster resources are consumed.

In short, the HPA can’t be considered a complete scalability solution because it can’t intelligently determine or implement configuration changes based on an application’s ongoing behavior. This is another ripe area for ML-driven assistance.

Cluster Node Scaling

If we believe our application behavior and resource requirements are likely to fluctuate a lot, we can use cluster-level autoscaling to provision additional nodes to meet our pod scheduling demands. Rather than dividing and allocating our limited hardware — especially when we need more incompressible resources like memory — it’s often more beneficial to simply provide a higher resource ceiling across the entire cluster.

If we’re hosting our cluster in a public cloud, we can use the Cluster Autoscaler to provision additional resources by adding nodes to the cluster. The Cluster Autoscaler regularly checks for unschedulable pods. If any are found to be unschedulable due to a lack of resources, it will add nodes to the cluster pool.

If we wish to deploy our cluster on-premises, we can also use a cluster management API, such as those provided by vSphere or OpenStack. However, many of these cluster autoscaling solutions are metric-based and don’t consider the persistence of system-critical Kubernetes pods. They also tend to be inefficient in general, scheduling pods and provisioning idle nodes based on capacity usage rather than actual cluster needs.

Antipatterns for Scalability

As touched on previously, antipatterns represent common, ineffective responses to technical challenges. They offer attractive solutions when time or resources are constrained. However, they also introduce or exacerbate problems and have a negative effect overall. Let’s discuss some common antipatterns that we come across when designing for scalability in Kubernetes: Distracted by the Simple, Tuning by Folklore, and Missing the Big Picture.

Distracted by the Simple

This antipattern manifests when we target only the simplest or easiest-to-change parts of a system rather than analyzing and diagnosing the whole system. Plucking these “low-hanging fruits” can be deceptive because it often seems like we’re making real progress. The reality is that we’re choosing not to optimize parts of the system we aren’t comfortable with, and, even if we effect real change, it’s usually only a local optimum.

For example, let’s say we decide to configure our cluster to autoscale so that our application remains highly available during a half-hour morning spike in traffic. There’s a chance that more efficiently provisioned pods and some network traffic analysis would provide a comparable performance boost, but that requires a deeper analysis of our resource profiles. Instead, we end up paying our cloud provider for a two-hour block of cluster resources, which sits underutilized for the majority of that time. The price-performance tradeoff is poor. Alternatively, we may assign a high priority to pods we deem critical, but find that our aggregate service performance suffers because other pods are evicted at disproportionate rates.

Distracted by the Simple is often a defensive response to scaling challenges that stretch beyond a team’s comfort zone or to challenges that are thought to be tedious and difficult to solve. We can address this antipattern by ensuring that the team obtains a level of understanding necessary to scale each part of an application and is comfortable iterating through various application tunables to understand how each one affects various performance characteristics.

Tuning by Folklore

The availability of information on the internet makes it easy to fall prey to this antipattern. Leading responses on Stack Overflow and similar sites are often popular because they provide easily digestible recommendations that provide immediate benefits with minimal effort. As more users discover the fix, their enthusiasm can create a legend. As with the Distracted by the Simple antipattern, Tuning by Folklore often feels productive at first. The solutions can seem deceptively simple, and they work!

Unfortunately, as with any legend, much of the truth is masked by a lack of context — and misinformation today is magnified by search engine rankings. Even when the antipattern works for the specific component versions we use and in the specific environment to which we deploy, the solutions are rarely robust or efficient.

Missing the Bigger Picture

Missing the Bigger Picture is one of the most pervasive antipatterns in siloed or small teams. Developers tend to focus on individual settings or components, often relying on benchmarks to inform their configurations without examining the system more holistically.

This antipattern is a product of specialization and of the human tendency to see patterns where there may be none. A single person is unlikely to have the knowledge to examine an entire system, so they focus on what they know and are more likely to attribute differences in effects to the variables they can control.

Even when optimizing small parts of a system produces measurable results, it’s almost always more efficient to consider the whole system instead. In a system as potentially complex as a Kubernetes cluster, it’s unlikely that maximizing the performance of any one component will increase service quality to the degree we’re looking for. In fact, unless we have decisively pinpointed a performance bottleneck, it’s likely that focusing on individual components will actually produce diminishing returns. So unless we’re looking for a bottleneck, it’s best to examine our system holistically to catch interactions and emergent effects that aren’t evident at smaller scales.

Where Machine Learning Fits In

Now that we know which design patterns to use to scale our apps in Kubernetes, how do we address the more challenging parts of these patterns without taking shortcuts and possibly falling into the antipatterns we want to avoid?

As discussed above, the major drivers that prompt these design pattern shortcuts are:

Tedium — Any series of repetitive steps that quickly bores us, even though it may yield useful results and data. Often, the solution to a problem is to gather sufficient data necessary to drive the right decisions. However, the tedious data-gathering is cut short precisely because it is boring, and it is human nature to do what we can to avoid boredom.

Time-intensive tasks — Closely tied to tedium are tasks that are considered time-intensive. They are invariably scrutinized and often trimmed back if it’s expected that the time spent on the activity is not worth the results and insights produced. Again, if we curtail a series of data-gathering tasks before the results are clear, we risk making the wrong decisions.

Difficult data analysis — Even if we have gathered sufficient data, the human brain is only capable of drawing accurate conclusions from data derived from four or fewer variables.3 Keep in mind that Kubernetes scalability is multi-dimensional and can easily involve ten or more variables.

Ongoing analysis — The above tedious, time-intensive, or difficult tasks need to be performed on an ongoing basis for us to adequately respond to the dynamic nature of our applications. Given these conditions, it’s understandable that over-provisioning resources is often viewed as a satisfactory solution.

Automation solutions are available today to offload tedious and time-intensive tasks, including gathering the data necessary to make informed decisions. However, this type of automation is not powerful enough to overcome our difficulty in analyzing data with many variables.

Machine learning becomes crucial here, as it augments our abilities — analyzing data in a manner that we humans simply cannot. When combined with automation, ML performs this difficult data analysis on an ongoing basis. It efficiently addresses the dynamic nature of our applications by continually adjusting settings and making recommendations based on analysis that we would simply miss.

To learn more about these types of solutions, visit and request a demo.


Design patterns have been an important part of application development, but they should not be confined to software development alone. They provide established methodologies for solving all engineering problems in specific contexts and should be applied when building and managing our modern-day infrastructures and applications.

When designing our scalable Kubernetes-based applications, we should rely on the Predictable Demands and Elastic Scaling design patterns. With Predictable Demands, we ensure that our apps declare their resource and runtime requirements so that Kubernetes can schedule and run them efficiently and reliably. But specifying our resource requirements is merely the starting point for scheduling and scaling our apps. Elastic Scaling is the pattern that ensures our apps are configured to operate at their best before they are scheduled in the cluster, and then scale up and/or out, as appropriate, to meet dynamic user demands.

Though some of these design pattern requirements can seem complex and time-consuming, we must not try to shortcut them as this can lead us down the path of antipatterns. Shortsighted antipatterns will cost us time, money, and opportunity, and include those which:

  • Focus only on remedies that address the “lowest hanging fruit” (while ignoring the tougher challenges we must face eventually).
  • Leverage “folklore” performance tips without appropriately weighing their applicability for our applications.
  • Focus on only single parts of our systems without considering the bigger picture.

The good news is that we do not have to shy away from the more challenging parts of design patterns in order to effectively scale our apps. Automated, ML-based solutions can be used today to leverage built-in Kubernetes capabilities to train our learning models and help us by:

  • Improving our understanding of how applications behave in their specific contexts and uncovering what their predictable demands should be
  • Recommending how they should elastically scale for efficient operation and maximum uptime.

To learn more about these types of solutions, visit and request a demo to see how ML helps you easily apply these crucial design patterns when scaling your Kubernetes applications.


  1. Kubernetes Patterns – Bilgin Ibryam and Roland Huß
  2. Optimizing Java: Practical Techniques for Improving JVM Application Performance – Benjamin J Evans, James Gough, Chris Newland.
  3. How Many Variables Can Humans Process? – research study by Graeme S. Halford and Rosemary Baker of University of Queensland, Brisbane, Australia, Julie E. McCredden, and John D. Bain of Griffith University, Brisbane, Australia

Explore StormForge

Discover how StormForge can help automatically improve application performance and cost efficiency in cloud native environments. 

See a DemoStart Trial