Advanced Autoscaling in Kubernetes with KEDA

The Kubernetes Horizontal Pod Autoscaler (HPA)  is a foundational component for autoscaling that can be enhanced when used with Kubernetes Event-Driven Autoscaling (KEDA). As an advanced open-source project, KEDA builds on the HPA to provide significantly more flexibility, easy-to-use options for various metrics out of the box, and the important ability to scale applications to zero. 

KEDA configures the HPA to manage pod activity effectively, ensuring responsiveness remains consistent while accommodating dynamic environments. This makes KEDA particularly useful for workloads that see fluctuating traffic or spikes due to events.

KEDA offers a more nuanced approach to autoscaling that aligns with modern, event-driven application demands. KEDA was designed to extend the horizontal autoscaling capabilities of Kubernetes, enabling precise and more efficient scaling decisions. Its scaling actions are based on various event sources and metrics, addressing the critical challenges that HPA faces in diverse operational contexts.

Summary of Essential KEDA Kubernetes Concepts

Concept Description
HPA challenges The HPA typically requires configuring custom metric servers and adapters for advanced cases. It often falls short during unpredictable, high-traffic events. Advanced solutions like KEDA address these gaps by enabling dynamic, event-driven scaling that goes beyond traditional CPU and memory metrics.
What is KEDA, and how is it significant? KEDA is an open-source, lightweight tool that expands the horizontal scaling capabilities of Kubernetes through event-driven automation. It enables precise scaling based on diverse external triggers, optimizing resource use and cost efficiency in fluctuating traffic environments.
KEDA architecture and components KEDA’s architecture integrates seamlessly into Kubernetes, leveraging a series of components that enhance its native autoscaling capabilities. Key components include scalers, metrics adapters, and custom resource definitions such as ScaledObjects and ScaledJobs, enabling sophisticated, event-driven scaling.
Benefits of using KEDA for autoscaling KEDA’s autoscaling benefits include the ability to scale applications down to zero replicas, which optimizes resource use and cost efficiency. Additionally, it supports flexible scheduling through cron triggers and extensibility via custom scalers, enhancing compatibility with a variety of event sources and DevOps tools.
Practical applications and integration KEDA’s ScaledObjects and ScaledJobs enable efficient scaling of diverse Kubernetes workloads, from deployments and stateful sets to batch jobs. By leveraging custom triggers tailored to different metrics and event sources, KEDA provides flexible, dynamic scaling for optimized performance and resource management.


Kubernetes Horizontal Autoscaling Challenges

The HPA primarily scales applications based on CPU and memory metrics by adjusting the number of replicas based on demand. While the HPA supports scaling with other metrics, such as network traffic or custom indicators, configuring these is complex and not user-friendly. It’s crucial to understand that while you can set the HPA to use these advanced metrics, the setup requires careful configuration and a strong grasp of Kubernetes internals. This traditional approach also relies on maintaining static limits, potentially leading to resource overprovisioning or shortages during unexpected demand spikes, which affects both application performance and cost efficiency. 

There is a growing need for more sophisticated scaling solutions capable of responding to external events and a broader array of metrics. For example, while scaling on CPU utilization might be easy, it’s often not well correlated with the actual load on the application, so replicas don’t actually get added or removed when they need to be. 

In scenarios like flash sales or viral social media events, where traffic surges are sudden and unpredictable, the HPA struggles to scale effectively. These instances require a more responsive scaling mechanism, which is where an event-driven solution like KEDA comes into play. KEDA is designed to address these specific challenges by enabling Kubernetes to react not just to changes in CPU or memory metrics, but also to a multitude of external events, thereby providing a more fluid and efficient scaling response.

What is KEDA and How is it Significant?

KEDA is a lightweight, open-source tool that enhances Kubernetes’ horizontal autoscaling capabilities by integrating event-driven scaling, allowing applications to respond more dynamically to real-world demands. KEDA functions as a bridge between Kubernetes workloads and various event sources, enabling more efficient resource management.

KEDA allows for scaling based on a multitude of external triggers, such as messages in a queue, workload in databases, or events in a stream. This capability extends beyond conventional CPU and memory metrics, facilitating scaling actions that are precisely aligned with actual usage patterns and demand spikes. By doing so, KEDA optimizes resource utilization and cost efficiency, which makes it crucial for managing cloud-native applications in variable traffic environments.

KEDA Architecture and Components

The core of KEDA’s functionality lies in its architecture, which includes custom resource definitions (CRDs) such as ScaledObjects and ScaledJobs. These CRDs allow developers to define scaling rules based on external events, seamlessly integrating with existing Kubernetes ecosystems. By leveraging a wide range of supported event sources, KEDA empowers developers to build more responsive and cost-effective applications, representing a significant advancement in cloud-native application scaling.

How KEDA works in conjunction with the HPA and external event sources. (source)
Autonomous Rightsizing for K8S Workloads

Automated vertical autoscaling designed to scale for 100K+ containers

Fully compatible with HPA functionality and cloud-based services

Powered by advanced machine learning with user-controlled guardrails

KEDA distinguishes itself from Kubernetes’ Horizontal Pod Autoscaler (HPA) by introducing two new crucial components: 

  • External triggers based on data from the metrics server
  • Extensive authentication options for external metrics sources

These features facilitate direct interaction with a broad array of event sources, streamlining the autoscaling process. Unlike HPA, which may require additional setup and customization for similar capabilities, KEDA’s built-in functionalities offer a ready-to-use solution for event-driven scaling.

Consider the use case of scaling based on messages in an AWS SQS queue. With HPA, this scenario involves using an adapter like k8s-cloudwatch-adapter, which takes data from SQS and feeds it to the Kubernetes metrics-server. KEDA simplifies this by directly supporting AWS SQS as an event source, allowing for straightforward setup without the overhead of developing custom solutions.

Metrics Server Options

KEDA does not operate its own standalone metrics server like the default Kubernetes Metrics Server; instead, it acts as a metrics adapter. KEDA introduces an additional layer that allows the Kubernetes HPA to scale applications based on a variety of external metrics not typically available to the HPA. This includes metrics from various event sources, which KEDA fetches and exposes via the external.metrics.k8s.io API. The metrics can be queried using kubectl commands that are targeted toward this custom API endpoint provided by KEDA.

An illustration of KEDA’s dynamic scaling capabilities across various metrics

To support a wide array of use cases, KEDA’s metrics sources can be grouped into three primary categories based on their nature:

  • For cloud-based services, KEDA supports AWS CloudWatch, AWS Kinesis, SQS, DynamoDB, Azure App Insights, Azure Blob Storage, and much more, allowing seamless integration with major cloud providers. 
  • Open-source databases such as Prometheus, Elasticsearch, Cassandra, Redis, Graphite, InfluxDB, MongoDB, MySQL, and others offer flexibility for developers preferring open-source solutions for monitoring and event metrics. 
  • Additionally, integration with major observability vendors like DataDog and NewRelic ensures that KEDA can fit into a variety of monitoring and alerting ecosystems, providing a comprehensive solution for application autoscaling.

Authentication Mechanism Options

KEDA employs distinct authentication mechanisms tailored to each category of metrics sources: 

  • For cloud services like AWS and Azure, it uses native cloud provider authentication methods, such as IAM roles for AWS and Managed Identity for Azure, ensuring secure access to services like CloudWatch, SQS, or Azure Blob Storage. 
  • When connecting to open-source databases, such as Prometheus or MongoDB, KEDA can leverage service-specific credentials or integrate with existing Kubernetes secrets for secure communication. 
  • For observability platforms like DataDog or NewRelic, KEDA uses API keys or similar authentication tokens.

All cases are managed securely within the Kubernetes environment, facilitating a seamless and secure connection to these external metrics sources. KEDA provides a few secure patterns to manage authentication flows:

  • Configure authentication per ScaledObject
  • Reuse of per-namespace credentials or delegated authentication with TriggerAuthentication
  • Reuse of global credentials with ClusterTriggerAuthentication

Benefits of Using KEDA for Autoscaling

KEDA enhances Kubernetes autoscaling with a few notable features: 

  • The capability for applications to scale to zero replicas
  • Scaling jobs and scaling with a cron trigger 
  • Extensibility through custom scalers, which broadens its compatibility with a variety of event sources and DevOps tools 

Scaling Down to Zero Replicas

KEDA’s scale-down-to-zero functionality utilizes scaling jobs to efficiently manage resource allocation by deactivating idle pods when no active tasks are detected. This process is enabled through event monitoring, which adjusts the pod count to zero when there is a lack of demand, conserving resources when they are not needed. 

KEDA employs a unique method to handle scaling to zero, which involves bypassing the typical behavior of the HPA that it manages. When no events are detected that would trigger scaling, KEDA scales the deployment down to zero replicas. This is in contrast to the usual HPA behavior where at least one replica would typically remain active. Once events occur that meet the defined triggers, KEDA then instructs the HPA to scale up from zero accordingly, effectively reactivating the pods as needed.

This capability offers superior resource efficiency compared to the HPA, which only scales down to a ”non-zero” minimum configured pod count. HPA’s limitation in not scaling down completely means there is always some level of resource consumption, even when it’s unnecessary. KEDA significantly reduces costs and optimizes cloud resource management by entirely eliminating unnecessary resource usage when idle.

An example of this functionality can be seen in event-driven applications, such as those processing queue messages. A system using KEDA can monitor a message queue and scale to multiple pods when messages are detected, scaling back down to zero when the queue is empty. This ensures that resources are only used when absolutely necessary, exemplifying KEDA’s efficient approach to scaling in modern cloud architectures.

Stop Setting Kubernetes Requests and Limits

Custom Scalers and Cron Triggers

Two distinct features that enhance KEDA’s flexibility in autoscaling are its ability to integrate custom scalers and its scheduling capabilities using a cron trigger.

KEDA allows the implementation of custom scalers, providing full control over the metrics that drive autoscaling decisions. While version 2 of the HPA can indeed scale on any custom metric, setting this up can be challenging and complex. In contrast, KEDA offers a wide range of out-of-the-box scaling options that simplify integration with various metrics sources. This makes KEDA particularly valuable for environments needing to scale based on non-standard or external metrics efficiently.   

KEDA also introduces scheduling capabilities, such as scaling with a cron trigger, which adds another layer of flexibility. This feature allows users to define scaling actions at specific times, accommodating predictable workload variations, such as increased load during business hours or special events. For instance, an e-commerce platform could schedule additional resources in anticipation of high traffic during a promotional campaign, ensuring optimal performance when it matters most.

Practical Applications and Integration

Before we look at a scenario to implement, it’s useful to understand how KEDA lets you specify the Kubernetes Deployment or StatefulSet to scale based on specific triggers. This functionality is implemented using the ScaledObject Custom Resource definition.

The specification below uses a ScaledObject to define how KEDA should scale your application and what the triggers are:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
 name: {scaled-object-name}
   scaledobject.keda.sh/transfer-hpa-ownership: "true"     # Optional. Use to transfer an existing HPA ownership to this ScaledObject
   validations.keda.sh/hpa-ownership: "true"               # Optional. Use to disable HPA ownership validation on this ScaledObject
   autoscaling.keda.sh/paused: "true"                      # Optional. Use to pause autoscaling of objects explicitly
   apiVersion:    {api-version-of-target-resource}         # Optional. Default: apps/v1
   kind:          {kind-of-target-resource}                # Optional. Default: Deployment
   name:          {name-of-target-resource}                # Mandatory. Must be in the same namespace as the ScaledObject
   envSourceContainerName: {container-name}                # Optional. Default: .spec.template.spec.containers[0]
 pollingInterval:  30                                      # Optional. Default: 30 seconds
 cooldownPeriod:   300                                     # Optional. Default: 300 seconds
 idleReplicaCount: 0                                       # Optional. Default: ignored, must be less than minReplicaCount
 minReplicaCount:  1                                       # Optional. Default: 0
 maxReplicaCount:  100                                     # Optional. Default: 100
 fallback:                                                 # Optional. Section to specify fallback options
   failureThreshold: 3                                     # Mandatory if fallback section is included
   replicas: 6                                             # Mandatory if fallback section is included
 advanced:                                                 # Optional. Section to specify advanced options
   restoreToOriginalReplicaCount: true/false               # Optional. Default: false
   horizontalPodAutoscalerConfig:                          # Optional. Section to specify HPA related options
     name: {name-of-hpa-resource}                          # Optional. Default: keda-hpa-{scaled-object-name}
     behavior:                                             # Optional. Use to modify HPA's scaling behavior
         stabilizationWindowSeconds: 300
         - type: Percent
           value: 100
           periodSeconds: 15
 # {list of triggers to activate scaling of the target resource}

More details on the spec can be found here. Let’s review how we can use it in the real world scenario

Implementing Autoscaling with KEDA

Let’s imagine a scenario where a data processing company is handling various workloads across different systems. The primary workload is a web application hosted on a Kubernetes deployment that monitors incoming orders through an AWS SQS queue. A second workload involves a StatefulSet maintaining customer analytics data, which needs to process incoming Kafka streams to keep dashboards up to date. Finally, for performance metrics and reporting, a batch job generates regular reports using Prometheus metrics that evaluate application health.

For this scenario, we will use a combination of ScaledObjects and standard Kubernetes workloads as well as ScalingJobs. KEDA’s ScaledObjects enable flexible scaling across various workloads, including:

With custom triggers, applications can be scaled efficiently based on event-driven metrics, such as message queue lengths or specific database conditions. This adaptability ensures optimal resource management for a wide range of architectures. Let’s review what the manifests would look like for such a scenario.

Automate K8s autoscaling with machine learning
Learn More
Solution Rightsizing recommendations Automation Fully compatible with HPA  Powered by machine learning  Historical metrics analysis Trend forecasting
VPA ✖️ ✖️ ✖️ ✖️


Here is how to install KEDA in the Kubernetes cluster with Helm:

# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts

# Update your Helm repository
helm repo update

# Install KEDA into your Kubernetes cluster
helm install keda kedacore/keda --namespace keda --create-namespace

Deployment (AWS SQS)

  • IAM role: For accessing SQS

First, ensure your IAM role for the Kubernetes service account is set up with the right trust relationship. 

Save the following trust policy to a file named trust-policy.json:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/YOUR_OIDC_ID:sub": "system:serviceaccount:default:keda-sqs-sa"

Then, create the IAM role using the AWS CLI:

aws iam create-role --role-name KedaSQSRole --assume-role-policy-document file://trust-policy.json

Attach the policy created in step 1 to this role:

aws iam attach-role-policy --role-name KedaSQSRole --policy-arn arn:aws:iam::aws:policy/AmazonSQSReadOnlyAccess
  • Workload: A web app that processes orders coming through an SQS queue
  • ServiceAccount: Using the created IAM role
  • ScaledObject:
apiVersion: v1
kind: ServiceAccount
  name: keda-sqs-sa
  namespace: default
    eks.amazonaws.com/role-arn: "https://sqs.us-east-1.amazonaws.com/123456789012/order-queue"
apiVersion: apps/v1
kind: Deployment
  name: order-processor
  namespace: default
  replicas: 1
      app: order-processor
        app: order-processor
      serviceAccountName: keda-sqs-sa
        - name: order-container
          image: order-processor-image
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
  name: sqs-scaledobject
  namespace: default
    name: order-processor
    - type: aws-sqs-queue
        queueURL: "https://sqs.us-east-1.amazonaws.com/123456789012/order-queue"
        awsRegion: "us-east-1"
        queueLength: "10"

StatefulSet (Kafka)

  • Workload: A StatefulSet that processes Kafka streams for customer analytics
  • ScaledObject:
apiVersion: apps/v1
kind: StatefulSet
  name: analytics-processor
  namespace: default
      app: analytics-processor
  serviceName: "analytics-service"
        app: analytics-processor
        - name: analytics-container
          image: analytics-processor-image
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
  name: kafka-scaledobject
  namespace: default
    name: analytics-processor
    - type: kafka
        bootstrapServers: "kafka-broker1:9092,kafka-broker2:9092"
        topic: "customer-analytics"
        consumerGroup: "analytics-consumer-group"

ScalingJob (Prometheus)

Using a ScaledJob differs from the combination of a standard workload and a ScaledObject in that it allows for the scaling of batch jobs specifically, providing built-in parallelism and completions directly in the job configuration. This setup simplifies the management of repetitive or data-processing tasks that require dynamic scaling and completion control, making it ideal for batch processing or cron-like jobs.

  • Workload: A batch job generating health reports using Prometheus metrics
  • ScaledJob:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
  name: health-report-job
  namespace: default
    parallelism: 2
    completions: 2
          app: health-report-job
          - name: health-container
            image: health-report-image
        restartPolicy: Never
    - type: prometheus
        serverAddress: "http://prometheus-server"
        metricName: "http_requests_total"
        threshold: "100"
        query: "sum(rate(http_requests_total[5m]))"

This scenario setup demonstrates how KEDA’s ScaledObjects and ScaledJobs can be used to scale a diverse set of workloads dynamically and efficiently. 

How StormForge and KEDA Play Together

StormForge’s machine learning complements KEDA’s event-driven autoscaling by providing precise recommendations for CPU and memory allocations based on real-world application usage data. While KEDA ensures horizontal scalability through a broad set of triggers and event-driven metrics, StormForge offers tailored vertical scaling adjustments, removing guesswork and ensuring optimal resource allocation. Together, KEDA and StormForge create a comprehensive scaling strategy that improves performance and cost efficiency, covering both horizontal and vertical scaling needs for dynamic Kubernetes workloads.

StormForge could be applied to the data processing scenario described above to provide precise recommendations for optimal resource allocation in each workload:

  • For the web application scaling via AWS SQS, StormForge would analyze historical usage data and suggest CPU/memory adjustments, ensuring smooth scaling without overprovisioning.  
  • In the StatefulSet and batch job cases, StormForge’s machine learning-based recommendations would refine resource usage, aligning KEDA’s event-driven scaling triggers with the most efficient vertical scaling configurations, ultimately optimizing performance and reducing overall infrastructure costs.
Experience StormForge in a sandbox – no email required
Access Sandbox

Conclusion: KEDA is a Crucial Tool in a Comprehensive Scaling Strategy

KEDA has markedly advanced Kubernetes autoscaling, addressing the complexities of dynamic, cloud-native environments through event-driven horizontal scaling. Its integration simplifies scaling based on real-time data, enhancing application responsiveness and efficiency. 

As KEDA evolves, it’s poised for further enhancements, potentially incorporating AI and ML for more predictive scaling capabilities. In tandem with solutions like StormForge and Karpenter, KEDA is crucial in a comprehensive scaling strategy This ensures that Kubernetes resource management is more efficient and aligns with the demands of modern applications, setting the stage for future innovations in cloud infrastructure.

Continue reading this series:

Privacy Preference Center