Skip to content

Monitoring OpenShift and Cloud Paks🔗

For monitoring of Cloud Pak for Data we make use of the OpenShift Monitoring framework. The observations generated by Cloud Pak for Data are pushed to the OpenShift Monitoring Prometheus endpoint. This will allow (external) monitoring tools to combine the observations from the OpenShift platform and Cloud Pak for Data from a single source.

Monitoring overview

OpenShift monitoring🔗

To deploy Cloud Pak for Data Monitors, its is mandatory to also enable the OpenShift monitoring. OpenShift monitoring is activated via the openshift_monitoring object.

openshift_monitoring:
- openshift_cluster_name: pluto-01
  user_workload: enabled
  remote_rewrite_url: http://www.example.com:1234/receive
  retention_period: 15d
  pvc_storage_class: ibmc-vpc-block-retain-general-purpose
  pvc_storage_size_gb: 100
  grafana_operator: enabled
  grafana_project: grafana
  labels:
    cluster_name: pluto-01
Property Description Mandatory Allowed values
user_worload Allow pushing Prometheus metrics to OpenShift (must be set to True for monitoring to work) Yes True, False
pvc_storage_class Storage class to keep persistent monitoring data No Valid storage class
pvc_storage_size_gb Size of the PVC holding the monitoring data Yes if pv_storage_class is set
remote_rewrite_url Set this value to redirect metrics to remote Prometheus NO
retention_period Number of seconds (s), minutes (m), hours(h), days (d), weeks (w), years (y) to retain monitoring data. Default is 15d Yes
labels Additional labels to be added to the metrics No
grafana_operator Enable Grafana community operator? No False (default), True
grafana_project If enabled, project in which to enable the Grafana operator Yes, if grafana_operator enabled

Note Labels must be specified as a YAML record where each line is a key-value. The labels will be added to the prometheus key of the user-workload-monitoring-config ConfigMap and to the prometheusK8S key of the cluster-monitoring-config ConfigMap.

Note When the Grafana operator is enabled, you can build your own Grafana dashboard based on the metrics collected by Prometheus. When installed, Grafana creates a local admin user with user name root and passwowrd secret. Grafana can be accessed using the OpenShift route that is created in the project specified by grafana_project.

Cloud Pak for Data monitoring🔗

The observations of Cloud Pak for Data are generated using the zen-watchdog component, which is part of the cpd_platform cartridge and therefore available on each instance of Cloud Pak for Data. Part of the zen-watchdog installation is a set of monitors which focus on the technical deployment of Cloud Pak for Data (e.g. running pods and bound Persistent Volume Claims (pvcs)).

Additional monitors which focus more on the operational usage of Cloud Pak for Data can be deployed as well. These monitors are maintained in a seperate Git repository and be accessed at IBM/cp4d-monitors. Using the Cloud Pak Deployer, monitors can be deployed which uses the Cloud Pak for Data zen-watchdog monitor framework. This allows adding custom monitors to the zen-watchdog, making these custom monitors visible in the Cloud Pak for Data metrics.

Cloud Pak for Data Monitors Overview

Using the Cloud Pak Deployer cp4d_monitors capability implements the following: - Create Cloud Pak for Data ServiceMonitor endpoint to forward zen-watchdog monitor events to OpenShift Cluster monitoring - Create source repository auth secrets (optional, if pulling monitors from secure repo) - Create target container registry auth secrets (optional, if pushing monitor images to secure container registry) - Deploy custom monitors, which will be added to the zen-watchdog monitor framework

For custom monitors to be deployed, it is mandatory to enable the OpenShift user-workload monitoring, as specified in OpenShift monitoring.

The Cloud Pak for Data monitors are specified in a cp4d_monitors definition.

cp4d_monitors:
- name: cp4d-monitor-set-1
  cp4d_instance: zen-45
  openshift_cluster_name: pluto-01
  default_monitor_source_repo: https://github.com/IBM/cp4d-monitors
  #default_monitor_source_token_secret: monitors_source_repo_secret
  #default_monitor_target_cr: de.icr.io/monitorrepo  
  #default_monitor_target_cr_user_secret: monitors_target_cr_username
  #default_monitor_target_cr_password_secret: monitors_target_cr_password
  # List of monitors
  monitors:
  - name: cp4dplatformcognosconnectionsinfo
    context: cp4d-cognos-connections-info
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformcognostaskinfo
    context: cp4d-cognos-task-info
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformglobalconnections
    context: cp4d-platform-global-connections
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformwatsonstudiojobinfo
    context: cp4d-watsonstudio-job-info
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformwatsonstudiojobscheduleinfo
    context: cp4d-watsonstudio-job-schedule-info
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformwatsonstudioruntimeusage
    context: cp4d-watsonstudio-runtime-usage
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformwatsonknowledgecataloginfo
    context: cp4d-wkc-info
    label: latest
    schedule: "*/15 * * * *"
  - name: cp4dplatformwmldeploymentspaceinfo
    context: cp4d-wml-deployment-space-info
    label: latest  
    schedule: "*/15 * * * *"
  - name: cp4dplatformwmldeploymentspacejobinfo
    context: cp4d-wml-deployment-space-job-info
    label: latest
    schedule: "*/15 * * * *"

Each cp4d_monitors entry contains a set of default settings, which are applicable to the monitors list. These defaults can be overwritten per monitor if needed.

Property Description Mandatory Allowed values
name The name of the monitor set Yes lowercase RFC 1123 subdomain (1)
cp4d_instance The OpenShift project (namespace) on which the Cloud Pak for Data instance resides Yes
openshift_cluster_name The Openshift cluster name Yes
default_monitor_source_repo The default repository location of all monitors located in the monitors section No
default_monitor_source_token_secret The default repo access token secret name, must be available in the vault No
default_monitor_target_cr The default target container registry (cr) for the monitor image to be pushed. When omitted, the OpenShift internal registry is used No
default_monitor_target_cr_user_secret The default target container registry user name secret name used to push the monitor image. Must be available in the vault No
default_monitor_target_cr_password_secret The default target container registry password secret name used to push the monitor image. Must be available in the vault No
monitors List of monitors Yes

Per monitors entry, the following settings are specified:

Property Description Mandatory Allowed values
name The name of the monitor entry Yes lowercase RFC 1123 subdomain (1)
monitor_source_repo Overrides default_monitor_source_repo for this single monitor No
monitor_source_token_secret Overrides default_monitor_source_token_secret for this single monitor No
monitor_target_cr Overrides default_monitor_target_cr for this single monitor No
monitor_target_cr_user_secret Overrides default_monitor_target_cr_user_secret for this single monitor No
monitor_target_cr_user_password Overrides default_monitor_target_cr_user_password for this single monitor No
context Sets the context of the monitor the the source repo (sub folder name) Yes
label Set the label of the pushed image, default to 'latest' No
schedule Sets the schedule of the generated Cloud Pak for Data monitor cronjob Yes

Each monitor has a set of event_types, which contain the observations generated by the monitor. These event types are retrieved directly from the github repository, which it is expected that each context contains a file called event_types.yml. During deployment of the monitor this file is retrieved and used to populate the event_types of the monitor.

If the Deployer runs and the monitor is already deployed, the following process is used: - The build process is restarted to ensure the latest image of monitor is used - A comparison is made between the monitor's current configuration and the configuration created by the Deployer. If these are identical, the monitor's configuration is left as-is, however if these are different, the monitor's configuration is rebuild and the monitor is re-deployed.

Example monitior - global platform connections🔗

This monitor counts the number of Global Platform connections and for each Global Platform Connection a test is executed to test whether the connection can still be established.

Generated metrics🔗

Once the monitor is deployed, the following metrics are available in IBM Cloud Pak for Data.

Overview Events and Alerts

On the Platform Management Events page the following entries are added: - Cloud Pak for Data Global Connections Count - Global Connection - <Global Connection Name> (for each connection)

Using the IBM Cloud Pak for Data Prometheus endpoint🔗

https://<CP4D-BASE-URL>/zen/metrics

It will generate 2 types of metrics:

  • global_connections_count
    Provides the number of available connections
  • global_connection_valid
    For each connection, a test action is performed
    • 1 (Test Connection success)
    • 0 (Test connection failed)
# HELP global_connections_count 
# TYPE global_connections_count gauge
global_connections_count{event_type="global_connections_count",monitor_type="cp4d_platform_global_connections",reference="Cloud Pak for Data Global Connections Count"} 2

# HELP global_connection_valid 
# TYPE global_connection_valid gauge
global_connection_valid{event_type="global_connection_valid",monitor_type="cp4d_platform_global_connections",reference="Cognos MetaStore Connection"} 1
global_connection_valid{event_type="global_connection_valid",monitor_type="cp4d_platform_global_connections",reference="Cognos non-shared"} 0

Zen Watchdog metrics (used in platform management events) - watchdog_cp4d_platform_global_connections_global_connections_count - watchdog_cp4d_platform_global_connections_global_connection_valid (for each connection)

Zen Watchdog metrics can have the following values: - 2 (info) - 1 (warning) - 0 (critical)

# HELP watchdog_cp4d_platform_global_connections_global_connection_valid 
# TYPE watchdog_cp4d_platform_global_connections_global_connection_valid gauge
watchdog_cp4d_platform_global_connections_global_connection_valid{event_type="global_connection_valid",monitor_type="cp4d_platform_global_connections",reference="Cognos MetaStore Connection"} 2
watchdog_cp4d_platform_global_connections_global_connection_valid{event_type="global_connection_valid",monitor_type="cp4d_platform_global_connections",reference="Cognos non-shared"} 1

# HELP watchdog_cp4d_platform_global_connections_global_connections_count 
# TYPE watchdog_cp4d_platform_global_connections_global_connections_count gauge
watchdog_cp4d_platform_global_connections_global_connections_count{event_type="global_connections_count",monitor_type="cp4d_platform_global_connections",reference="Cloud Pak for Data Global Connections Count"} 2