Watson OpenScale¶
After you deploy a machine learning model, the work doesn’t stop. To guarantee that the model is functioning in production as expected, you must have a plan for monitoring the deployed model and updating it as needed. As part of your end-to-end MLOps process, consider IBM Watson OpenScale to evaluate model deployment to make sure they are fair, accurate, and performing to your standards.•
When OpenScale is installed or provisioned as part of your Cloud Pak suite, you can provide the details for a deployment, then run scheduled evaluations that measure dimensions you configure for thresholds you set. For example, if you want to test whether predicted outcomes are fair across various age groups, you can configure the Fairness monitor to evaluate the outcomes for a monitored group, such as young adults, and compare the results to the age group most likely to get favorable results. If the results deviate more than a threshold you specify, you will get an alert that results require attention. The dimensions you can test are:
- Fairness: Configure a monitor for fairness to check if your model produces biased results for different groups, like gender or race. Set thresholds to measure predictions for a monitored group compared to a reference group.
- Quality: Configure a monitor for quality to assess your model's performance based on labeled test data. Set quality thresholds to track when a metric value falls outside an acceptable range.
- Drift: Configure a monitor for drift to ensure your deployments are up-to-date and consistent. Use feature importance to determine the impact of feature drift on your model.
- Explainability: Configure explainability settings to understand which features influence your model's predictions. Different methods like SHAP and LIME are available to suit your needs.
All of the evaluation results can be reviewed and monitored in a single dashboard, as shown in this example:
For more information check out the official documentation or the example Notebooks.
In this use case we use Watson OpenScale to automatically verify:
-
whether the model is performing at a constant high accuracy. If performance dips below a threshold we set, an alert is triggered.
-
whether the test data produces output that is similar to the training data. If there is a noticable deviation, we are alerted that it might be time to retrain the model.
-
whether the model is discriminating against a particular group. In this case, we look at outcomes for older customers to ensure they are being treated fairly as compared to the reference group.
Results are automatically checked by Watson OpenScale whenever new data is scored by the model. If one of the checks fails, an alert will be send to the responsible person so that steps can be taken to mitigate the problem. OpenScale also generates visualizations that you can use to diagnose a potential problem, as shown in this example: