AI Inference Server
Solution report card
| Runs on IBM i? | ❌ | |
| On-prem | ✅ | |
| IBM Cloud | ✅ | |
| AI capabilities | Generative AI Agentic AI | |
| Commercial support | ✅ | |
| Free to try? | ✅ | |
| Requirements |
What is Red Hat AI Inference Server?
Section titled “What is Red Hat AI Inference Server?”Red Hat AI Inference Server (formerly known as vLLM-based inference) is Red Hat’s enterprise-supported solution for serving large language models at production scale. It provides a high-performance, OpenAI-compatible REST API for LLM inference, backed by Red Hat’s support and lifecycle commitments.
Built on the open-source vLLM engine, Red Hat AI Inference Server is optimized for throughput and latency at scale — using techniques like continuous batching and PagedAttention to maximize GPU (or CPU/accelerator) utilization.
Why use it with IBM i?
Section titled “Why use it with IBM i?”IBM i applications can call the inference server’s OpenAI-compatible API using standard HTTP, SQL HTTP functions (SYSTOOLS.HTTPPOSTCLOB), or the Db2 for i AI SDK. This means IBM i can consume LLM capabilities from a supported, on-premises inference server without sending data to the cloud.
Key benefits for IBM i scenarios:
- On-premises deployment — Keep sensitive business data within your network
- OpenAI-compatible API — Drop-in compatible with tools and code written for OpenAI; no proprietary lock-in
- IBM Power support — Can run on IBM Power (Linux) infrastructure adjacent to IBM i
- Enterprise support — Red Hat support contract covers the inference server
Deployment
Section titled “Deployment”Red Hat AI Inference Server is delivered as a container image, deployable on OpenShift or standalone container runtimes. It can run on IBM Power (via ppc64le container images) on a Linux partition adjacent to IBM i.
See also: Red Hat OpenShift AI for the broader MLOps platform that includes model training, pipelines, and governance alongside inference.