Manual Deployment of Watson Machine Learning Deployment Space Job Information
This page will go through all manual steps to deploy the Watson Machine Learning Deployment Space Job Information monitor, and in addition to delete it.
The following pre-requisites are assumed:
- IBM Cloud Pak for Data is successfully deployed
- (Optional) Prometheus is configured. Refer to setup OpenShift Prometheus and Cloud Pak for Data ServiceMonitor for instructions
This manual deployment will be based on:
- Source of the Monitor is located in a Git Repository
- The Image will be pushed to the internal OpenShift Image Registry
Deploy Watson Machine Learning Deployment Space Job Information
Build the Monitor Image and push to the registry
export CP4D_PROJECT=<CP4D_PROJECT>export OPENSHIFT_IMAGE_REGISTRY=image-registry.openshift-image-registry.svc:5000/${CP4D_PROJECT}oc new-build https://github.com/IBM/cp4d-monitors \--context-dir cp4d-wml-deployment-space-job-info \--name cp4d-platform-wml-deployment-space-job-info \--to ${OPENSHIFT_IMAGE_REGISTRY}/cp4d-platform-wml-deployment-space-job-info:latest \--to-docker=true \--namespace ${CP4D_PROJECT}
Wait for the build to complete successfully
export CP4D_PROJECT=<CP4D_PROJECT>oc wait -n ${CP4D_PROJECT} --for=condition=Complete build/cp4d-platform-wml-deployment-space-job-info-1 --timeout=300s
or to monitor the build process:
export CP4D_PROJECT=<CP4D_PROJECT>oc logs -n ${CP4D_PROJECT} build/cp4d-platform-wml-deployment-space-job-info-1 -f
Ensure the build finishes with the message Push successful
Create the Cloud Pak for Data zen-watchdog monitor configuration
export CP4D_PROJECT=<CP4D_PROJECT>export OPENSHIFT_IMAGE_REGISTRY=image-registry.openshift-image-registry.svc:5000/${CP4D_PROJECT}export ICPDATA_ADDON_VERSION=$(oc get po -l component=zen-watcher -o jsonpath='{.items[0].metadata.annotations.productVersion}')cat << EOF | oc apply -f -kind: ConfigMapapiVersion: v1metadata:name: zen-alert-cp4d-platform-wml-deployment-space-job-info-monitor-extension
Note: Once the ConfigMap above is created, the zen-watcher pod will detect it. Please check the log of zen-watcher pod for details. For example:
export CP4D_PROJECT=<CP4D_PROJECT>export ZEN_WATCHER_POD=$(oc get po -l component=zen-watcher -o custom-columns=CONTAINER:.metadata.name --no-headers)oc logs ${ZEN_WATCHER_POD}time="2022-05-31 05:25:59" level=info msg=processConfigData event="adding extensions from zen-alert-cp4d-platform-wml-deployment-space-job-info-monitor-extension to the database"time="2022-05-31 05:25:59" level=info msg=CleanUpStaleExtensions event="upgrade extensions: removing stale extensions from zen-alert-cp4d-platform-wml-deployment-space-job-info-monitor-extension to the database"time="2022-05-31 05:25:59" level=info msg=processExtensionHandler event="processing action: update for extension" extension_name=zen_alert_cp4d_platform_wml_deployment_space_job_infotime="2022-05-31 05:25:59" level=info msg=watchConfigMap event="config zen-alert-cp4d-platform-wml-deployment-space-job-info-monitor-extension added"
Wait for zen-watchdog to create cronjob
Get the watchdog-alert-monitoring-cronjob cronjob details of Cloud Pak for Data
export CP4D_PROJECT=<CP4D_PROJECT>oc get cronjob watchdog-alert-monitoring-cronjob -n ${CP4D_PROJECT}NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGEwatchdog-alert-monitoring-cronjob */10 * * * * False 0 3m46s 7d3h
This cronjob must run in order for the Watson Studio Job cronjob to be created. Optionally the schedule can be changed to trigger its execution. The pod zen-watchdog can be monitored for any error messages:
export CP4D_PROJECT=<CP4D_PROJECT>export ZEN_WATCHDOG_POD=$(oc get po -n ${CP4D_PROJECT} -l component=zen-watchdog -o custom-columns=CONTAINER:.metadata.name --no-headers)oc logs -n ${CP4D_PROJECT} ${ZEN_WATCHDOG_POD} -f
The new monitor cronjob is created:
export CP4D_PROJECT=<CP4D_PROJECT>oc get cronjob -n ${CP4D_PROJECT}NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGEcp4dplatformwmldeploymentspacejobinfo-cronjob */15 * * * * False 0 <none> 3s
Most monitors require access to the Cloud Pak for Data /user-home folder to cache information. To test whether this mount point is already present on the monitor use the following command:
export CP4D_PROJECT=<CP4D_PROJECT>export CP4D_CRONJOB=cp4dplatformwmldeploymentspacejobinfo-cronjoboc set volume -n ${CP4D_PROJECT} cronjobs/${CP4D_CRONJOB} | grep "mounted at /user-home" | wc -l
If the result is 0
, patch the cronjob using the following command:
oc patch cronjob -n ${CP4D_PROJECT} ${CP4D_CRONJOB} \--type=json \--patch '[{"op": "add","path": "/spec/jobTemplate/spec/template/spec/containers/0/volumeMounts/-","value": {"name": "user-home-mount","mountPath": "/user-home"}}]'
Based on the schedule the cronjob will be executed. This will create a pod, which can be monitored:
export CP4D_PROJECT=<CP4D_PROJECT>oc logs -n ${CP4D_PROJECT} <PODNAME>
Rebuilding the image
When changes are applied to the monitor, restarting the Build Config will re-build and push the image to the image registry. No other changed are required. The next time the cronjob is executed, the new version of the monitor image will be used
export CP4D_PROJECT=<CP4D_PROJECT>export CP4D_CRONJOB=cp4dplatformwmldeploymentspacejobinfo-cronjoboc start-build -n ${CP4D_PROJECT} cp4d-platform-wml-deployment-space-job-info
Monitor the build using (use -2, -3 etc, based on the created build by the previous command):
export CP4D_PROJECT=<CP4D_PROJECT>oc logs -n ${CP4D_PROJECT} build/cp4d-platform-wml-deployment-space-job-info-2 -f
Patch the cronjob so it will Always pull the image to ensure it will fetch the latest version once triggered
oc patch cronjob -n ${CP4D_PROJECT} ${CP4D_CRONJOB} \--type=json \--patch '[{"op":"replace","path":"/spec/jobTemplate/spec/template/spec/containers/0/imagePullPolicy","value":"Always"}]'
Remove the Monitor
Use the following commands to delete the monitor
export CP4D_PROJECT=<CP4D_PROJECT>export CP4D_CRONJOB=cp4dplatformwmldeploymentspacejobinfo-cronjoboc delete bc -n ${CP4D_PROJECT} cp4d-platform-wml-deployment-space-job-infooc delete cm -n ${CP4D_PROJECT} zen-alert-cp4d-platform-wml-deployment-space-job-info-monitor-extensionoc delete cronjob -n ${CP4D_PROJECT} ${CP4D_CRONJOB}oc delete cm -n ${CP4D_PROJECT} cp4d-monitor-configuration
Note: Please only delete the configmap cp4d-monitor-configuration if no other monitors are deployed. The configmap cp4d-monitor-configuration is generated by monitor. When monitor is deployed and scheduled to run, it will create the configmap with its default values.
"cp4d-job-last-refresh" : "0""cp4d-job-refresh-interval-minutes": "120""cp4d-project-last-refresh": "0""cp4d-project-refresh-interval-minutes": "240""cp4d-space-last-refresh": "0""cp4d-space-refresh-interval-minutes": "120""cp4d-wkc-last-refresh": "0""cp4d-wkc-refresh-interval-minutes": "120"
Reset Cloud Pak for Data metrics configuration and influxdb
If, during development, the zen-watchdog is unable to process events because of an incorrect configuration or naming convention, using the following steps to reset the zen-watchdog and its influxdb
export CP4D_PROJECT=<CP4D_PROJECT>oc project ${CP4D_PROJECT}oc exec -it zen-metastoredb-0 /bin/bashmkdir /tmp/certscp -Lr /certs/* /tmp/certscd /tmp && chmod 0600 ./certs/*cd /cockroach
Wait for the cronjobs to be re-created
Acquire the Password for influxdb, and copy it.
oc extract secret/dsx-influxdb-auth --keys=influxdb-password --to=-
Delete the influxdb entries
oc exec -it dsx-influxdb-0 bashinflux -ssl -unsafeSslauth<enter>admin#Delete the eventsuse WATCHDOG;drop measurement events;