Scaling

Scaling is the process of adjusting the capacity of your Event Streams cluster to handle changing workloads. You can scale your cluster in two ways:

  • Manual scaling: You can scale your Event Streams components by modifying their configuration. For more information, see manual scaling.
  • Automatic scaling: In Event Streams 12.2.0 and later, you can enable auto-scaling, which is supported only for Kafka brokers.

If you want partitions to rebalance automatically whenever brokers are added or removed, enable auto-rebalancing before scaling your brokers. After scaling, you can verify partition reassignment to ensure that new brokers received partition assignments and that workloads are balanced.

Before you scale your Event Streams cluster, review your current configuration. The prerequisite guidance gives various examples of different production configurations on which you can base your deployment. To verify it meets your requirements, you should test the system with a workload that is representative of the expected throughput. For this purpose, Event Streams provides a workload generator application to test different message loads.

If this testing shows that your system does not have the capacity needed for the workload, whether this results in excessive lag or delays, or more extreme errors such as OutOfMemory errors, then you can incrementally make the increases detailed in the following sections, re-testing after each change to identify a configuration that meets your specific requirements.

A performance report based on example case studies is also available to provide guidance for setting these values.

Note: Although the testing for the report was based on Apache Kafka version 2.3.0, the performance numbers are broadly applicable to current versions of Kafka as well.

Modifying the settings

Scaling operations require updates to the EventStreams custom resource, where broker counts, resource allocations, and related configuration are defined under the spec.strimziOverrides property. For more information about modifying these settings, see modifying installation.

Manual scaling for Event Streams components

You can scale the number of Kafka brokers, modify CPU, memory, and disk resources for broker nodes, adjust resource settings for supporting components, or change JVM options for brokers to meet workload requirements.

The following sections describe the available manual scaling options.

Increase the number of Kafka brokers in the cluster

The number of Kafka brokers is defined in the EventStreams custom resource in the spec.strimziOverrides.nodePools section.

You can scale your Kafka cluster by either:

  • Increasing the number of brokers in an existing pool by updating the replicas value.
  • Adding a new Kafka node pool by defining a new entry under spec.strimziOverrides.nodePools and specifying a unique name and the desired number of replicas.

For example to configure Event Streams to use 6 Kafka brokers in a pool:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    nodePools:
      - name: kafka
        replicas: 6

Note: Controller node pools cannot be scaled or modified dynamically in KRaft mode. For more information, see KRaft limitations.

For more information about scaling up Kafka node pools, see the Strimzi documentation.

Increase the CPU request or limit settings for the Kafka brokers

The CPU settings for the Kafka brokers are defined in the EventStreams custom resource in the requests and limits properties under spec.strimziOverrides.kafka.resources (for all brokers in the Kafka cluster) or under spec.strimziOverrides.nodePools[].resources (for brokers within a node pool).

Example 1: Configure same CPU Settings for all node pools

To configure Event Streams Kafka brokers in all pools to have a CPU request set to 2 CPUs and limit set to 4 CPUs:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    # ...
    kafka:
      # ...
      resources:
        requests:
          cpu: 2000m
        limits:
          cpu: 4000m
      # ...

This configuration applies the same CPU settings across all pools. A description of the syntax for these values can be found in the Kubernetes documentation.

Example 2: Configure different CPU settings for specific node pools

If you need to set different resource configurations for individual Kafka node pools, you can define the resource properties under spec.strimziOverrides.nodePools.resources.

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    # ...
      nodePools:
        - name: kafka
          resources:
            requests:
              cpu: 2000m
            limits:
              cpu: 4000m
          # ...
        - name: kafka-2
          resources:
            requests:
              cpu: 4000m
            limits:
              cpu: 8000m
          # ...

Increase the memory request or limit settings for the Kafka brokers

The memory settings for the Kafka broker nodes are defined in the EventStreams custom resource in the requests and limits properties under spec.strimziOverrides.nodePools[].resources, based on the assigned roles.

Alternatively, to apply the same memory settings to all Kafka nodes, you can configure them globally under spec.strimziOverrides.kafka.resources in the EventStreams custom resource. All Kafka node pools will automatically inherit these settings unless overridden at the node pool level.

For example to configure a pool of Kafka brokers to have a memory request set to 4GB and limit set to 8GB:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    # ...
    kafka:
      # ...
    nodePools:
      - name: kafka
        role: broker
        resources:
          requests:
            memory: 4096Mi
          limits:
            memory: 8096Mi

The syntax for these values can be found in the Kubernetes documentation.

Modifying the resources available to supporting components

The resource settings for each supporting component are defined in the EventStreams custom resource in their corresponding component key the requests and limits properties under spec.<component>.resources. For example, to configure the Apicurio Registry to have a memory request set to 4GB and limit set to 8GB:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  apicurioRegistry:
    # ...
    resources:
      requests:
        memory: 4096Mi
      limits:
        memory: 8096Mi

The syntax for these values can be found in the Kubernetes documentation.

Increase the disk space available to each Kafka broker

The Kafka brokers need sufficient storage to meet the retention requirements for all of the topics in the cluster. Disk space requirements grow with longer retention periods for messages, increased message sizes and additional topic partitions.

The amount of storage made available to brokers in Kafka node pools is defined at the time of installation in the EventStreams custom resource in the spec.strimziOverrides.nodePools[].storage.size property. For example:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    # ...
      nodePools:
        - name: kafka
          storage:
            # ...
            size: 100Gi

Modifying the JVM settings for Kafka brokers

If you have specific requirements, you can modify the JVM settings for the Kafka brokers.

Note: Take care when modifying these settings as changes can have an impact on the functioning of the product.

Note: Only a selected subset of the available JVM options can be configured.

JVM settings for the pools of Kafka brokers are defined in the EventStreams custom resource in the spec.strimziOverrides.kafka.jvmOptions property for all Kafka brokers. Alternatively, you can configure the JVM settings separately for each Kafka node pool by using spec.strimziOverrides.kafka.nodePools.jvmOptions.

For example, to set JVM options for all Kafka brokers:

apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
  # ...
  strimziOverrides:
    # ...
    kafka:
      # ...
      jvmOptions:
        -Xms: 4096m
        -Xmx: 4096m

Auto-scaling Kafka brokers

In Event Streams 12.2.0 and later, you can configure Event Streams to automatically scale Kafka brokers by creating a Kubernetes HorizontalPodAutoscaler (HPA). After you create an HPA custom resource for Kafka brokers, the HPA scales the number of broker pods based on the metrics that you specify in the custom resource (for example, CPU or memory utilization).

Auto-scaling handles traffic spikes by automatically increasing capacity when demand increases, or by reducing resource usage during low-load periods. It removes the need to manually adjust broker replicas and works with Cruise Control to redistribute partitions when brokers are added or removed.

Important: Auto-scaling is supported only for Kafka broker node pools. Node pools that have a controller role, including dual-role node pools, cannot be auto-scaled.

Prerequisites

Before you begin, ensure the following:

  • Cruise Control is enabled.
  • Optional: Consider enabling tiered storage, which reduces the amount of data moved during rebalancing, and can improve performance when brokers are added or removed.
  • Auto-rebalancing is enabled. Ensure that all required topics and partitions exist in the cluster so that partitions are distributed correctly. You can verify this by running the kubectl get kafkatopic -n <namespace> command.

Enabling auto-scaling for Kafka brokers

After you enable auto-rebalancing, you can configure an HPA for Kafka broker node pools to automatically scale based on resource usage.

Complete the following steps to enable auto-scaling for Kafka brokers:

  1. Add the following annotation to the broker node pool configuration in your EventStreams custom resource to enable HPA support:

    strimziOverrides:
      kafka:
        nodePools:
          - name: kafka
            replicas: 3
            annotations:
              "eventstreams.ibm.com/hpa-enabled": "true"
    

    Apply your updated EventStreams custom resource. The Event Streams operator no longer manages the replica count for this broker node pool and the HPA takes control of scaling.

  2. Create a HorizontalPodAutoscaler custom resource for the broker node pool and define your scaling rule.

    For example, the following configuration scales brokers when their average CPU utilization exceeds 50%.

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: kafka-hpa
    spec:
      scaleTargetRef:
        apiVersion: eventstreams.ibm.com/v1beta2
        kind: KafkaNodePool
        name: <node-pool-name>
      minReplicas: 3
      maxReplicas: 5
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 90
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
            - type: Pods
              value: 2
              periodSeconds: 300
        scaleUp:
          stabilizationWindowSeconds: 300
          policies:
            - type: Pods
              value: 2
              periodSeconds: 300
    

    Apply the HorizontalPodAutoscaler custom resource. After it is created, the HPA monitors the broker node pool and scales brokers automatically based on the defined rules.

    In this example, after the average CPU utilization across broker pods exceeds the averageUtilization threshold (set to 90 percent), the HPA scales up the number of pods to the specified maxReplicas (5).

Verifying auto-scaling

After auto-scaling is enabled, you can verify whether scaling occurred and the new broker pods received partition assignments.

  • Run the following command to check the HPA status:

    kubectl get hpa -n <namespace>
    

    Example output:

    NAME        REFERENCE              TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
    kafka-hpa   KafkaNodePool/kafka    cpu: 100%/90%   3         5         5          5h8m
    

    In this example, the HPA output indicates that the CPU usage exceeded the target, and the brokers scaled up to 5 pods, confirming that auto-scaling happened.

  • Optional: Check the current broker pods. Run the following command to see how many broker pods are currently running after auto-scaling:

    kubectl get pods -l eventstreams.ibm.com/broker-role=true -n <namespace>
    

    Example output:

    NAME                         READY   STATUS    RESTARTS   AGE
    min-prod-scram-kafka-3       1/1     Running   0          5h33m
    min-prod-scram-kafka-4       1/1     Running   0          5h33m
    min-prod-scram-kafka-5       1/1     Running   0          5h33m
    min-prod-scram-kafka-6       1/1     Running   0          1h30m
    min-prod-scram-kafka-7       1/1     Running   0          1h30m
    
  • Verify partition reassignment to ensure that new brokers received partitions after scaling.

Configuring auto-rebalancing for Kafka brokers

Auto-rebalancing ensures that partitions remain evenly distributed across brokers as the cluster scales. Cruise Control uses KafkaRebalance custom resources as templates to determine how partitions should be redistributed when brokers are added or removed.

Complete the following steps to configure Cruise Control to automatically rebalance the cluster during scaling:

  1. Create KafkaRebalance templates that define the optimization goals Cruise Control must use when adding and removing brokers. For example:

    apiVersion: eventstreams.ibm.com/v1beta2
    kind: KafkaRebalance
    metadata:
      name: rebalance-template
      annotations:
        eventstreams.ibm.com/rebalance-template: "true"
    spec:
      skipHardGoalCheck: true
      goals:
        - LeaderReplicaDistributionGoal
        - LeaderBytesInDistributionGoal
        - DiskUsageDistributionGoal
        - CpuUsageDistributionGoal
        - ReplicaDistributionGoal
        - NetworkInboundUsageDistributionGoal
        - NetworkOutboundUsageDistributionGoal    
    
  2. Enable auto-rebalancing by specifying the KafkaRebalance templates in the autoRebalance section of your EventStreams custom resource under the appropriate modes:
    apiVersion: eventstreams.ibm.com/v1beta2
    kind: EventStreams
    metadata:
      name: prod-3
    spec:
      strimziOverrides:
        cruiseControl:
          # ...
            autoRebalance:
              - mode: add-brokers
                template:
                  name: rebalance-template
              - mode: remove-brokers
                template:
                  name: rebalance-template
    
  3. After enabling auto-rebalance, verify that your Kafka pods have rolled and the Cruise Control pod is running.

    kubectl get kafka  <instance-name> -o jsonpath='{.metadata.generation} {.status.observedGeneration}'
    

    If the metadata.generation and status.observedGeneration values are the same, the Kafka broker pods and the Cruise Control pod have rolled.

If you want to check the current state of the auto-rebalance operation after scaling, run the following command:

kubectl get kafka <instance-name> -o jsonpath='{.status.autoRebalance.state}'

The state of the operation can be one of the following values:

  • Idle: When no auto-rebalance operation is running.
  • RebalanceOnScaleDown: When the auto-rebalance operation for scaling down the brokers is running.
  • RebalanceOnScaleUp: When the auto-rebalance operation for scaling the brokers up is running.

Verifying partition reassignment

After brokers are added or removed, verify that partitions have been redistributed across the updated set of brokers.

After you initialize the Event Streams CLI, run the following command to verify that new brokers received partition assignments after scaling:

kubectl es topic <topic_name> | awk '/Partition details.../,/Configuration parameters.../'

Example output:

Partition ID   Leader   Replicas        In-sync
0              3        [3,4,5]         [3,4,5]
1              6        [6,7,8]         [6,7,8]
2              4        [4,5,7]         [4,5,7]

This command lists the broker IDs in the Leader, Replicas, and In-sync columns. The new broker IDs shown in the Replicas and In-sync columns confirm that auto-rebalancing happened after scaling.

Migrating Kafka broker storage

You can migrate the storage used by your Kafka brokers without creating a new cluster. By using Kafka node pools and Cruise Control, you can move broker workloads from one storage configuration to another while maintaining cluster availability and data integrity.

Important: The migration involves moving large amounts of data between brokers. The rebalance operation must be carefully planned, as the rebalance process adds additional workload on the cluster and might temporarily impact the cluster performance.

Typically, you migrate Kafka broker storage when you want to update or optimize the existing storage configuration. For example, you might want to:

  • Migrate to a faster or more cost-efficient storage class.
  • Resize or replace existing broker disks.

Before you begin, ensure that:

  • Node pools that have a broker role are configured in your cluster.
  • Cruise Control is enabled.

The following steps describe how to migrate broker storage without recreating your Kafka cluster.

  1. Update your EventStreams custom resource to add a new Kafka node pool with the required storage configuration.

    For example, the following configuration includes an existing node pool (brokers-gp2) with the gp2-ebs storage class and a new node pool (brokers-gp3) with the gp3-ebs storage class:

    apiVersion: eventstreams.ibm.com/v1beta2
    kind: EventStreams
    metadata:
      name: my-cluster
    spec:
      strimziOverrides:
        nodePools:
          - name: brokers-gp2
            roles:
              - broker
            replicas: 3
            storage:
              type: persistent-claim
              size: 1Ti
              class: gp2-ebs
          - name: brokers-gp3
            annotations:
              eventstreams.ibm.com/next-node-ids: '[5-10]'
            roles:
              - broker
            replicas: 3
            storage:
              type: persistent-claim
              size: 1Ti
              class: gp3-ebs
    

    Apply the configuration and wait for the new broker pods to become ready. Event Streams creates the new broker pods alongside the existing ones that use the previous storage.

    Note: This step adds new Kafka broker pods to the cluster. Ensure that your cluster has sufficient resources to run the new and existing pods in parallel.

  2. When the new brokers are ready, define a KafkaRebalance custom resource (for example, kafkarebalance.yaml) to instruct Cruise Control to migrate partition replicas from the previous brokers to the new node pool. For example:

    apiVersion: eventstreams.ibm.com/v1beta2
    kind: KafkaRebalance
    metadata:
      name: migrate-storage
      labels:
        eventstreams.ibm.com/cluster: my-cluster
      annotations:
        eventstreams.ibm.com/rebalance-auto-approval: "true"
    spec:
      mode: remove-brokers
      brokers: [0, 1, 2]   
    

    The values 0, 1, 2 in the brokers field represent the IDs of the previous broker nodes that are being removed from the cluster. You can find the IDs of the previous brokers to remove in the nodePools[*].nodeIds section of the EventStreams custom resource of your cluster.

  3. Run the following command to apply the KafkaRebalance configuration and start the migration.

    kubectl apply -f kafkarebalance.yaml
    

    Cruise Control automatically moves partition replicas from previous brokers to the new brokers in the new node pool.

    For more information about how Cruise Control optimizes Kafka clusters and handles rebalancing, see optimizing Kafka clusters with Cruise Control.

  4. Run the following command to check the Cruise Control rebalance status and verify migration progress:

    kubectl get kafkarebalances
    

    When the status shows Ready, Cruise Control has completed the migration and redistributed all partition replicas to the new brokers. For example:

    Name               Cluster        Status
    migrate-storage    my-cluster     Ready
    
  5. After the migration completes and the cluster is stable, remove the previous broker node pool from your EventStreams custom resource.

    For example, remove the definition of the previous broker node pool (brokers-gp2):

    apiVersion: eventstreams.ibm.com/v1beta2
    kind: EventStreams
    metadata:
      name: my-cluster
    spec:
      strimziOverrides:
        nodePools:
          - name: brokers-gp2
            roles:
              - broker
            replicas: 3
            storage:
              type: persistent-claim
              size: 1Ti
              class: gp2-ebs
    

    After removal, the updated configuration must include only the new node pool (brokers-gp3):

    apiVersion: eventstreams.ibm.com/v1beta2
    kind: EventStreams
    metadata:
      name: my-cluster
    spec:
      strimziOverrides:
        nodePools:
          - name: brokers-gp3
            roles:
              - broker
            replicas: 3
            storage:
              type: persistent-claim
              size: 1Ti
              class: gp3-ebs
    
  6. Apply the updated configuration to remove the previous broker nodes from the cluster.

    Note: The corresponding KafkaNodePool custom resource is not automatically deleted when the node pool is removed from the EventStreams custom resource. Run the following command to delete it manually:

     kubectl delete kafkanodepool <node-pool-name> -n <namespace>