Scaling is the process of adjusting the capacity of your Event Streams cluster to handle changing workloads. You can scale your cluster in two ways:
- Manual scaling: You can scale your Event Streams components by modifying their configuration. For more information, see manual scaling.
- Automatic scaling: In Event Streams 12.2.0 and later, you can enable auto-scaling, which is supported only for Kafka brokers.
If you want partitions to rebalance automatically whenever brokers are added or removed, enable auto-rebalancing before scaling your brokers. After scaling, you can verify partition reassignment to ensure that new brokers received partition assignments and that workloads are balanced.
Before you scale your Event Streams cluster, review your current configuration. The prerequisite guidance gives various examples of different production configurations on which you can base your deployment. To verify it meets your requirements, you should test the system with a workload that is representative of the expected throughput. For this purpose, Event Streams provides a workload generator application to test different message loads.
If this testing shows that your system does not have the capacity needed for the workload, whether this results in excessive lag or delays, or more extreme errors such as OutOfMemory errors, then you can incrementally make the increases detailed in the following sections, re-testing after each change to identify a configuration that meets your specific requirements.
A performance report based on example case studies is also available to provide guidance for setting these values.
Note: Although the testing for the report was based on Apache Kafka version 2.3.0, the performance numbers are broadly applicable to current versions of Kafka as well.
Modifying the settings
Scaling operations require updates to the EventStreams custom resource, where broker counts, resource allocations, and related configuration are defined under the spec.strimziOverrides property. For more information about modifying these settings, see modifying installation.
Manual scaling for Event Streams components
You can scale the number of Kafka brokers, modify CPU, memory, and disk resources for broker nodes, adjust resource settings for supporting components, or change JVM options for brokers to meet workload requirements.
The following sections describe the available manual scaling options.
Increase the number of Kafka brokers in the cluster
The number of Kafka brokers is defined in the EventStreams custom resource in the spec.strimziOverrides.nodePools section.
You can scale your Kafka cluster by either:
- Increasing the number of brokers in an existing pool by updating the replicas value.
- Adding a new Kafka node pool by defining a new entry under
spec.strimziOverrides.nodePoolsand specifying a unique name and the desired number of replicas.
For example to configure Event Streams to use 6 Kafka brokers in a pool:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
nodePools:
- name: kafka
replicas: 6
Note: Controller node pools cannot be scaled or modified dynamically in KRaft mode. For more information, see KRaft limitations.
For more information about scaling up Kafka node pools, see the Strimzi documentation.
Increase the CPU request or limit settings for the Kafka brokers
The CPU settings for the Kafka brokers are defined in the EventStreams custom resource in the requests and limits properties under spec.strimziOverrides.kafka.resources (for all brokers in the Kafka cluster) or under spec.strimziOverrides.nodePools[].resources (for brokers within a node pool).
Example 1: Configure same CPU Settings for all node pools
To configure Event Streams Kafka brokers in all pools to have a CPU request set to 2 CPUs and limit set to 4 CPUs:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
# ...
kafka:
# ...
resources:
requests:
cpu: 2000m
limits:
cpu: 4000m
# ...
This configuration applies the same CPU settings across all pools. A description of the syntax for these values can be found in the Kubernetes documentation.
Example 2: Configure different CPU settings for specific node pools
If you need to set different resource configurations for individual Kafka node pools, you can define the resource properties under spec.strimziOverrides.nodePools.resources.
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
# ...
nodePools:
- name: kafka
resources:
requests:
cpu: 2000m
limits:
cpu: 4000m
# ...
- name: kafka-2
resources:
requests:
cpu: 4000m
limits:
cpu: 8000m
# ...
Increase the memory request or limit settings for the Kafka brokers
The memory settings for the Kafka broker nodes are defined in the EventStreams custom resource in the requests and limits properties under spec.strimziOverrides.nodePools[].resources, based on the assigned roles.
Alternatively, to apply the same memory settings to all Kafka nodes, you can configure them globally under spec.strimziOverrides.kafka.resources in the EventStreams custom resource. All Kafka node pools will automatically inherit these settings unless overridden at the node pool level.
For example to configure a pool of Kafka brokers to have a memory request set to 4GB and limit set to 8GB:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
# ...
kafka:
# ...
nodePools:
- name: kafka
role: broker
resources:
requests:
memory: 4096Mi
limits:
memory: 8096Mi
The syntax for these values can be found in the Kubernetes documentation.
Modifying the resources available to supporting components
The resource settings for each supporting component are defined in the EventStreams custom resource in their corresponding component key the requests and limits properties under spec.<component>.resources.
For example, to configure the Apicurio Registry to have a memory request set to 4GB and limit set to 8GB:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
apicurioRegistry:
# ...
resources:
requests:
memory: 4096Mi
limits:
memory: 8096Mi
The syntax for these values can be found in the Kubernetes documentation.
Increase the disk space available to each Kafka broker
The Kafka brokers need sufficient storage to meet the retention requirements for all of the topics in the cluster. Disk space requirements grow with longer retention periods for messages, increased message sizes and additional topic partitions.
The amount of storage made available to brokers in Kafka node pools is defined at the time of installation in the EventStreams custom resource in the spec.strimziOverrides.nodePools[].storage.size property. For example:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
# ...
nodePools:
- name: kafka
storage:
# ...
size: 100Gi
Modifying the JVM settings for Kafka brokers
If you have specific requirements, you can modify the JVM settings for the Kafka brokers.
Note: Take care when modifying these settings as changes can have an impact on the functioning of the product.
Note: Only a selected subset of the available JVM options can be configured.
JVM settings for the pools of Kafka brokers are defined in the EventStreams custom resource in the spec.strimziOverrides.kafka.jvmOptions property for all Kafka brokers. Alternatively, you can configure the JVM settings separately for each Kafka node pool by using spec.strimziOverrides.kafka.nodePools.jvmOptions.
For example, to set JVM options for all Kafka brokers:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
strimziOverrides:
# ...
kafka:
# ...
jvmOptions:
-Xms: 4096m
-Xmx: 4096m
Auto-scaling Kafka brokers
In Event Streams 12.2.0 and later, you can configure Event Streams to automatically scale Kafka brokers by creating a Kubernetes HorizontalPodAutoscaler (HPA). After you create an HPA custom resource for Kafka brokers, the HPA scales the number of broker pods based on the metrics that you specify in the custom resource (for example, CPU or memory utilization).
Auto-scaling handles traffic spikes by automatically increasing capacity when demand increases, or by reducing resource usage during low-load periods. It removes the need to manually adjust broker replicas and works with Cruise Control to redistribute partitions when brokers are added or removed.
Important: Auto-scaling is supported only for Kafka broker node pools. Node pools that have a controller role, including dual-role node pools, cannot be auto-scaled.
Prerequisites
Before you begin, ensure the following:
- Cruise Control is enabled.
- Optional: Consider enabling tiered storage, which reduces the amount of data moved during rebalancing, and can improve performance when brokers are added or removed.
- Auto-rebalancing is enabled. Ensure that all required topics and partitions exist in the cluster so that partitions are distributed correctly. You can verify this by running the
kubectl get kafkatopic -n <namespace>command.
Enabling auto-scaling for Kafka brokers
After you enable auto-rebalancing, you can configure an HPA for Kafka broker node pools to automatically scale based on resource usage.
Complete the following steps to enable auto-scaling for Kafka brokers:
-
Add the following annotation to the broker node pool configuration in your
EventStreamscustom resource to enable HPA support:strimziOverrides: kafka: nodePools: - name: kafka replicas: 3 annotations: "eventstreams.ibm.com/hpa-enabled": "true"Apply your updated
EventStreamscustom resource. The Event Streams operator no longer manages the replica count for this broker node pool and the HPA takes control of scaling. -
Create a
HorizontalPodAutoscalercustom resource for the broker node pool and define your scaling rule.For example, the following configuration scales brokers when their average CPU utilization exceeds 50%.
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: kafka-hpa spec: scaleTargetRef: apiVersion: eventstreams.ibm.com/v1beta2 kind: KafkaNodePool name: <node-pool-name> minReplicas: 3 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 90 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Pods value: 2 periodSeconds: 300 scaleUp: stabilizationWindowSeconds: 300 policies: - type: Pods value: 2 periodSeconds: 300Apply the
HorizontalPodAutoscalercustom resource. After it is created, the HPA monitors the broker node pool and scales brokers automatically based on the defined rules.In this example, after the average CPU utilization across broker pods exceeds the
averageUtilizationthreshold (set to 90 percent), the HPA scales up the number of pods to the specifiedmaxReplicas(5).
Verifying auto-scaling
After auto-scaling is enabled, you can verify whether scaling occurred and the new broker pods received partition assignments.
-
Run the following command to check the HPA status:
kubectl get hpa -n <namespace>Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE kafka-hpa KafkaNodePool/kafka cpu: 100%/90% 3 5 5 5h8mIn this example, the HPA output indicates that the CPU usage exceeded the target, and the brokers scaled up to 5 pods, confirming that auto-scaling happened.
-
Optional: Check the current broker pods. Run the following command to see how many broker pods are currently running after auto-scaling:
kubectl get pods -l eventstreams.ibm.com/broker-role=true -n <namespace>Example output:
NAME READY STATUS RESTARTS AGE min-prod-scram-kafka-3 1/1 Running 0 5h33m min-prod-scram-kafka-4 1/1 Running 0 5h33m min-prod-scram-kafka-5 1/1 Running 0 5h33m min-prod-scram-kafka-6 1/1 Running 0 1h30m min-prod-scram-kafka-7 1/1 Running 0 1h30m -
Verify partition reassignment to ensure that new brokers received partitions after scaling.
Configuring auto-rebalancing for Kafka brokers
Auto-rebalancing ensures that partitions remain evenly distributed across brokers as the cluster scales. Cruise Control uses KafkaRebalance custom resources as templates to determine how partitions should be redistributed when brokers are added or removed.
Complete the following steps to configure Cruise Control to automatically rebalance the cluster during scaling:
-
Create
KafkaRebalancetemplates that define the optimization goals Cruise Control must use when adding and removing brokers. For example:apiVersion: eventstreams.ibm.com/v1beta2 kind: KafkaRebalance metadata: name: rebalance-template annotations: eventstreams.ibm.com/rebalance-template: "true" spec: skipHardGoalCheck: true goals: - LeaderReplicaDistributionGoal - LeaderBytesInDistributionGoal - DiskUsageDistributionGoal - CpuUsageDistributionGoal - ReplicaDistributionGoal - NetworkInboundUsageDistributionGoal - NetworkOutboundUsageDistributionGoal - Enable auto-rebalancing by specifying the
KafkaRebalancetemplates in theautoRebalancesection of yourEventStreamscustom resource under the appropriate modes:apiVersion: eventstreams.ibm.com/v1beta2 kind: EventStreams metadata: name: prod-3 spec: strimziOverrides: cruiseControl: # ... autoRebalance: - mode: add-brokers template: name: rebalance-template - mode: remove-brokers template: name: rebalance-template -
After enabling auto-rebalance, verify that your Kafka pods have rolled and the Cruise Control pod is running.
kubectl get kafka <instance-name> -o jsonpath='{.metadata.generation} {.status.observedGeneration}'If the
metadata.generationandstatus.observedGenerationvalues are the same, the Kafka broker pods and the Cruise Control pod have rolled.
If you want to check the current state of the auto-rebalance operation after scaling, run the following command:
kubectl get kafka <instance-name> -o jsonpath='{.status.autoRebalance.state}'
The state of the operation can be one of the following values:
Idle: When no auto-rebalance operation is running.RebalanceOnScaleDown: When the auto-rebalance operation for scaling down the brokers is running.RebalanceOnScaleUp: When the auto-rebalance operation for scaling the brokers up is running.
Verifying partition reassignment
After brokers are added or removed, verify that partitions have been redistributed across the updated set of brokers.
After you initialize the Event Streams CLI, run the following command to verify that new brokers received partition assignments after scaling:
kubectl es topic <topic_name> | awk '/Partition details.../,/Configuration parameters.../'
Example output:
Partition ID Leader Replicas In-sync
0 3 [3,4,5] [3,4,5]
1 6 [6,7,8] [6,7,8]
2 4 [4,5,7] [4,5,7]
This command lists the broker IDs in the Leader, Replicas, and In-sync columns. The new broker IDs shown in the Replicas and In-sync columns confirm that auto-rebalancing happened after scaling.
Migrating Kafka broker storage
You can migrate the storage used by your Kafka brokers without creating a new cluster. By using Kafka node pools and Cruise Control, you can move broker workloads from one storage configuration to another while maintaining cluster availability and data integrity.
Important: The migration involves moving large amounts of data between brokers. The rebalance operation must be carefully planned, as the rebalance process adds additional workload on the cluster and might temporarily impact the cluster performance.
Typically, you migrate Kafka broker storage when you want to update or optimize the existing storage configuration. For example, you might want to:
- Migrate to a faster or more cost-efficient storage class.
- Resize or replace existing broker disks.
Before you begin, ensure that:
- Node pools that have a
brokerrole are configured in your cluster. - Cruise Control is enabled.
The following steps describe how to migrate broker storage without recreating your Kafka cluster.
-
Update your
EventStreamscustom resource to add a new Kafka node pool with the required storage configuration.For example, the following configuration includes an existing node pool (
brokers-gp2) with thegp2-ebsstorage class and a new node pool (brokers-gp3) with thegp3-ebsstorage class:apiVersion: eventstreams.ibm.com/v1beta2 kind: EventStreams metadata: name: my-cluster spec: strimziOverrides: nodePools: - name: brokers-gp2 roles: - broker replicas: 3 storage: type: persistent-claim size: 1Ti class: gp2-ebs - name: brokers-gp3 annotations: eventstreams.ibm.com/next-node-ids: '[5-10]' roles: - broker replicas: 3 storage: type: persistent-claim size: 1Ti class: gp3-ebsApply the configuration and wait for the new broker pods to become ready. Event Streams creates the new broker pods alongside the existing ones that use the previous storage.
Note: This step adds new Kafka broker pods to the cluster. Ensure that your cluster has sufficient resources to run the new and existing pods in parallel.
-
When the new brokers are ready, define a
KafkaRebalancecustom resource (for example,kafkarebalance.yaml) to instruct Cruise Control to migrate partition replicas from the previous brokers to the new node pool. For example:apiVersion: eventstreams.ibm.com/v1beta2 kind: KafkaRebalance metadata: name: migrate-storage labels: eventstreams.ibm.com/cluster: my-cluster annotations: eventstreams.ibm.com/rebalance-auto-approval: "true" spec: mode: remove-brokers brokers: [0, 1, 2]The values
0, 1, 2in thebrokersfield represent the IDs of the previous broker nodes that are being removed from the cluster. You can find the IDs of the previous brokers to remove in thenodePools[*].nodeIdssection of theEventStreamscustom resource of your cluster. -
Run the following command to apply the
KafkaRebalanceconfiguration and start the migration.kubectl apply -f kafkarebalance.yamlCruise Control automatically moves partition replicas from previous brokers to the new brokers in the new node pool.
For more information about how Cruise Control optimizes Kafka clusters and handles rebalancing, see optimizing Kafka clusters with Cruise Control.
-
Run the following command to check the Cruise Control rebalance status and verify migration progress:
kubectl get kafkarebalancesWhen the status shows Ready, Cruise Control has completed the migration and redistributed all partition replicas to the new brokers. For example:
Name Cluster Status migrate-storage my-cluster Ready -
After the migration completes and the cluster is stable, remove the previous broker node pool from your
EventStreamscustom resource.For example, remove the definition of the previous broker node pool (
brokers-gp2):apiVersion: eventstreams.ibm.com/v1beta2 kind: EventStreams metadata: name: my-cluster spec: strimziOverrides: nodePools: - name: brokers-gp2 roles: - broker replicas: 3 storage: type: persistent-claim size: 1Ti class: gp2-ebsAfter removal, the updated configuration must include only the new node pool (
brokers-gp3):apiVersion: eventstreams.ibm.com/v1beta2 kind: EventStreams metadata: name: my-cluster spec: strimziOverrides: nodePools: - name: brokers-gp3 roles: - broker replicas: 3 storage: type: persistent-claim size: 1Ti class: gp3-ebs -
Apply the updated configuration to remove the previous broker nodes from the cluster.
Note: The corresponding
KafkaNodePoolcustom resource is not automatically deleted when the node pool is removed from theEventStreamscustom resource. Run the following command to delete it manually:kubectl delete kafkanodepool <node-pool-name> -n <namespace>