Monitoring Kafka cluster health

Monitoring the health of your Kafka cluster helps to verify that your operations are running smoothly. The Event Streams UI includes a preconfigured dashboard that monitors Kafka data.

Event Streams also provides a number of ways to export metrics from your Kafka brokers to external monitoring and logging applications. These metrics are useful indicators of the health of the cluster, and can provide warnings of potential problems. The following sections provide an overview of the available options.

For information about the health of your topics, check the producer activity dashboard.

JMX Exporter

You can use Event Streams to collect JMX metrics from Kafka brokers, ZooKeeper nodes, and Kafka Connect nodes, and export them to Prometheus.

For an example of how to configure the JMX exporter, see configuring the JMX Exporter

Kafka Exporter

You can use Event Streams to export metrics to Prometheus. These metrics are otherwise only accessible through the Kafka command-line tools. This allows topic metrics such as consumer group lag to be collected.

For an example of how to configure a Kafka Exporter, see configuring the Kafka Exporter.

JmxTrans (deprecated)

JmxTrans can be used to push JMX metrics from Kafka brokers to external applications or databases. For more information, see configuring JmxTrans.

Note: Support for jmxtrans in Event Streams version 11.1.5 and later is deprecated, as the jmxtrans tool is no longer maintained. For more information, see the Strimzi proposal about deprecating jmxtrans.

Grafana

You can use dashboards in the Grafana service to monitor your Event Streams instance for health and performance of your Kafka clusters.

Viewing installed Grafana dashboards

To view the Event Streams Grafana dashboards, follow these steps:

  1. Log in to your IBM Cloud Platform foundational services management console as an administrator. For more information, see the IBM Cloud Platform foundational services documentation.
  2. Navigate to the IBM Cloud Pak foundational services console homepage.
  3. Click the hamburger icon in the top left.
  4. Expand Monitor Health.
  5. Click the Monitoring in the expanded menu to open the Grafana homepage.
  6. Click the user icon in the bottom left corner to open the user profile page.
  7. In the Organizations table, find the namespace where you installed the Event Streams monitoringdashboard custom resource, and switch the user profile to that namespace.
  8. Hover over the Dashboards on the left and click Manage.
  9. Click on the dashboard you want to view in the Dashboard table.

Ensure you select your namespace, cluster name, and other filters at the top of the dashboard to view the required information.

Kibana

Create dashboards in the Kibana service that is provided by the OpenShift Container Platform cluster logging, and use the dashboards to monitor for specific errors in the logs and set up alerts for when a number of errors occur over a period of time in your Event Streams instance.

To install the Event Streams Kibana dashboards, follow these steps:

  1. Ensure you have cluster logging installed.
  2. Download the JSON file that includes the example Kibana dashboards for Event Streams from GitHub.

  3. Navigate to the Kibana homepage on your cluster.

    For IBM Cloud Pak foundational services: Click the hamburger icon in the top left and then expand Monitor Health. Then click Logging to open the Kibana homepage.

    For OpenShift Container Platform cluster logging stack: Log in to the OpenShift Container Platform web console using your login credentials. Then follow the instructions to navigate to cluster logging’s Kibana homepage.

  4. Click Management in the navigation on the left.
  5. Click Index patterns.
  6. Click Create index pattern.
  7. Enter app* in the Index pattern field, and click Next step.
  8. Select @timestamp from the Time Filter field name list, and click Create index pattern.
  9. Click Saved Objects.
  10. Click the Import icon and navigate to the JSON file you downloaded earlier that includes the example Kibana dashboards for Event Streams.
  11. If an Index Pattern Conflicts warning is displayed, select the app* index pattern from the New index pattern list for each conflict, then click Confirm all changes.
  12. Click Dashboard in the navigation on the left to view the downloaded dashboards.

IBM Instana

Instana is an observability tool that can be used to monitor your Event Streams deployment.

Instana also offers Kafka-centric monitoring that can provide useful insights into the performance and the health of your Kafka cluster.

For information about installing and configuring an Instana host agent on the Red Hat OpenShift Container Platform, see the Instana documentation.

After installing, Instana can monitor all aspects of an Event Streams instance with no extra configuration required.

Note: You might receive the following error message in the Instana dashboards when you check monitoring metrics for the Event Streams UI container:

Monitoring issue: nodejs_collector_not_installed

The @instana/collector package is not installed in this Node.js application, or the @instana/collector package cannot announce itself to the host agent, for example due to networking issues.

If you require monitoring of the Event Streams UI, you can enable Instana to monitor the UI by setting the following in the EventStreams custom resource:

  apiVersion: eventstreams.ibm.com/v1beta2
  kind: EventStreams
  # ...
  spec:
  # ...
  adminUI:
      env:
      -  name: INSTANA_AGENT_HOST
         valueFrom:
               fieldRef:
                  fieldPath: status.hostIP

Other Monitoring Tools

You can also use external monitoring tools to monitor the deployed Event Streams Kafka cluster.

Viewing the preconfigured dashboard

To get an overview of the cluster health, you can view a selection of metrics on the Event Streams Monitoring dashboard.

  1. Log in to your Event Streams UI as an administrator from a supported web browser (see how to determine the login URL for your Event Streams UI).
  2. Click Monitoring in the primary navigation. A dashboard is displayed with overview charts for messages, partitions, and replicas.
  3. Select 1 hour, 1 day, 1 week, or 1 month to view data for different time periods.