Monitoring the health of your Kafka cluster helps to verify that your operations are running smoothly. The Event Streams UI includes a preconfigured dashboard that monitors Kafka data.
Event Streams also provides a number of ways to export metrics from your Kafka brokers to external monitoring and logging applications. These metrics are useful indicators of the health of the cluster, and can provide warnings of potential problems. You can use any monitoring solution that is compatible with Prometheus and JMX formats to collect, store, visualize, and set up alerts based on metrics provided by Event Streams. The following sections provide an overview of the available options.
For information about the health of your topics, check the producer activity dashboard.
JMX Exporter
You can use Event Streams to collect JMX metrics from Kafka brokers, ZooKeeper nodes, and Kafka Connect nodes, and export them to Prometheus.
For an example of how to configure the JMX exporter, see configuring the JMX Exporter.
Kafka Exporter
You can use Event Streams to export Kafka metrics to Prometheus. These metrics are otherwise only accessible through the Kafka command-line tools. This allows topic metrics such as consumer group lag to be collected.
For an example of how to configure a Kafka Exporter, see configuring the Kafka Exporter.
Grafana
You can use Grafana and configure the example Grafana dashboards to monitor the health and performance of your Kafka clusters by collecting and displaying metrics from Prometheus.
Kibana
You can use Kibana on OpenShift and other Kubernetes platforms, where it might also be included, or you can install Kibana externally. You can use the example Kibana dashboards to monitor for specific errors in the logs and set up alerts for when a number of errors occur over a period of time in your Event Streams instance.
To install the Event Streams Kibana dashboards, download the JSON file that includes the example Kibana dashboards for Event Streams from GitHub, and configure your dashboard based on your platform.
IBM Instana
Instana is an observability tool that can be used to monitor your Event Streams deployment.
Instana also offers Kafka-centric monitoring that can provide useful insights into the performance and the health of your Kafka cluster.
For information about installing and configuring an Instana host agent on the Red Hat OpenShift Container Platform, see the Instana documentation for OpenShift. To use Instana on other Kubernetes platforms, see the Instana documentation for Kubernetes.
After installing, Instana can monitor all aspects of an Event Streams instance with no extra configuration required.
Note: You might receive the following error message in the Instana dashboards when you check monitoring metrics for the Event Streams UI container:
Monitoring issue: nodejs_collector_not_installed
The @instana/collector package is not installed in this Node.js application, or the @instana/collector package cannot announce itself to the host agent, for example due to networking issues.
If you require monitoring of the Event Streams UI, you can enable Instana to monitor the UI by setting the following in the EventStreams
custom resource:
apiVersion: eventstreams.ibm.com/v1beta2
kind: EventStreams
# ...
spec:
# ...
adminUI:
env:
- name: INSTANA_AGENT_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
Other Monitoring Tools
You can also use external monitoring tools to monitor the deployed Event Streams Kafka cluster.
Viewing the preconfigured dashboard
To get an overview of the cluster health, you can view a selection of metrics on the Event Streams Monitoring dashboard.
Important: Ensure that you enable the Monitoring dashboard by following the instructions in the post-upgrade tasks before accessing the dashboard.
- Log in to your Event Streams UI as an administrator from a supported web browser (see how to determine the login URL for your Event Streams UI).
- Click Monitoring in the primary navigation. A dashboard is displayed with overview charts for messages, partitions, and replicas.
- Select 1 hour, 1 day, 1 week, or 1 month to view data for different time periods.