ONNX and the IBM Deep Learning Compiler
Deploying ONNX deep learning models on IBM zSystems¶
ONNX is the Open Neural Network eXchange. You can read more about it here.
ONNX establishes a streamlined path to take a project from playground to production.
With ONNX, you can start a data science project using the frameworks and libraries of your choosing, including popular frameworks such as PyTorch and TensorFlow. The model can be developed and trained leveraging these frameworks on the training platform of your choice. Once the model is trained and ready to begin the deployment journey, you would export or convert it to the ONNX format.
Tools such as Netron allow inspection and exploration of an ONNX model. When it comes to running the model, there are various back-ends that can be used to test and serve ONNX models. This includes model compilers such as the IBM Deep Learning Compiler (DLC), which is based on ONNX-MLIR.
The ONNX-MLIR project provides compiler technology to transform a valid Open Neural Network Exchange (ONNX) graph into code that implements the graph with minimum runtime support. It implements the ONNX standard and is based on the underlying LLVM/MLIR compiler technology.
The result of this compiler is a lightweight shared object library that has no dependencies on the framework or libraries that the model was developed and trained in. It can easily be used for inference from C++, Java or Python programs.
IBM z16 Integrated Accelerator for AI¶
IBM Research enhanced the IBM Deep Learning Compiler (DLC) to target the IBM Integrated Accelerator for AI for a variety of ONNX primitives. This support has been contributed to ONNX-MLIR, which is the foundation for the IBM Deep Learning Compiler.
Getting Started with the IBM Z Deep Learning Compiler¶
The best approach to getting started with ONNX models using the IBM Deep Learning Compiler will depend on the IBM zSystem operating system on which you plan to use the inference program.
z/OS users can either choose a Machine Learning for z/OS (WMLz) based approach or leverage Linux on Z options; in either case, z/OS Container Extensions will be required to utilize the IBM Z Deep Learning Compiler.
There are two WMLz based options:
-
Machine Learning for z/OS, which is a z/OS product that manages full model lifecycle and includes numerous features to improve performance for AI models and simplify deployment.
- Enables you to upload your ONNX model then compile and deploy at the push of a button.
- Supports server side mini-batching for ONNX/DLC model serving to get the best benefit out of the Integrated Accelerator for AI.
- Infuse AI into z/OS applications either through native CICS, IMS, and Java native scoring services or through using model server REST endpoints.
-
Machine Learning for z/OS Online Scoring Community Edition (OSCE), which is freely available and excels at enabling rapid prototyping and proof of concept exercises.
- Simple install to z/OS Container Extensions (zCX).
- Enables you to upload your ONNX model then compile and deploy at the push of a button.
- Includes serving capability that exposes REST end points to call from an application.
- Available on the ibm.com WMLz page: 'Download trial code'.
-
Additional resources:
- If you are interested in trying WMLz OSCE, here is a quick self-directed exercise that demonstrates how to call a ONNX model from a z/OS Java program: Demonstrating a z/OS application calling WMLz OSCE
Linux on Z and LinuxONE users can leverage the Deep Learning Compiler directly to create model programs that can be incorporated into serving environments or applications directly.
- Available through the IBM Z and LinuxONE Container Registry listed under zDLC.
- Command-line model compiler that produces a .so library with optional Java and Python wrappers.
- These model libraries can be leveraged either directly or through open-source AI inference servers such as Triton Inference Server.
- Additional samples:
Read our blogs on ONNX for more information: