Skip to content

Machine Learning

Machine learning (ML) is the practice of training algorithms to make predictions or decisions from data, without being explicitly programmed for each case. IBM i is an ideal starting point for ML because the data is already there — structured, reliable, and stored in Db2 for i — and the business domain context needed to frame a meaningful ML problem already exists in the people and processes that run on the platform.

IBM i systems accumulate years or decades of high-quality transactional data. That history is exactly what ML models need to learn from. Common ML use cases on IBM i include:

  • Fraud and anomaly detection — Identify unusual patterns in financial transactions or order data
  • Demand forecasting — Predict inventory needs based on historical sales and seasonal patterns
  • Predictive maintenance — Anticipate equipment failures using sensor or operational data
  • Customer segmentation — Group customers by behavior to improve targeting and service
  • Churn prediction — Identify customers likely to leave before they do

The key advantage: you don’t have to move your data to get started. Tools like Mapepire, JT400, and the Db2 for i Python SDK let you query Db2 for i directly from Python ML environments.

A typical ML project with IBM i data follows this path:

  1. Access the data — Connect a Python ML environment to Db2 for i
  2. Prepare the data — Clean, transform, and engineer features
  3. Train a model — Use a framework like scikit-learn, XGBoost, or PyTorch
  4. Evaluate the model — Validate accuracy, precision, recall, and fairness
  5. Export the model (if needed) — Package for deployment outside the training environment
  6. Run inference — Score new records against the trained model
  7. Integrate results — Surface predictions back into IBM i applications

Training happens in a Python environment — either on IBM i itself, on an adjacent Linux/Power server(via the Python Ecosystem for IBM Power), or in a cloud ML platform like watsonx.ai or Red Hat OpenShift AI.

A minimal scikit-learn example reading from Db2 for i via Mapepire:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report
from mapepire_python import connect
# Connect to Db2 for i database using Mapepire and fetch data
# The config.ini file should contain connection details (host, user, password, etc.)
with connect("./config.ini") as conn:
with conn.execute("select * from demo.orders") as cursor:
results = cursor.fetchall()
df = pd.DataFrame(results['data'])
# ============================================================================
# FEATURE ENGINEERING & DATA PREPARATION
# ============================================================================
# Select features (independent variables) for the model
# Features used: ORDER_AMOUNT, DAYS_SINCE_LAST_ORDER, CUSTOMER_SEGMENT
X = df[["ORDER_AMOUNT", "DAYS_SINCE_LAST_ORDER", "CUSTOMER_SEGMENT"]]
# Select target variable
# Binary classification: 1 = churned, 0 = not churned
y = df['CHURNED']
# Encode categorical variable (CUSTOMER_SEGMENT) to numerical values
# LabelEncoder converts text categories (e.g., 'Premium', 'Standard') to integers (0, 1, 2, etc.)
le = LabelEncoder()
X['CUSTOMER_SEGMENT'] = le.fit_transform(X['CUSTOMER_SEGMENT'])
# ============================================================================
# DATA SPLITTING
# ============================================================================
# Split dataset into training (80%) and testing (20%) sets
# - random_state=42: Ensures reproducible splits
# - stratify=y: Maintains the same proportion of churned/non-churned customers in both sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# ============================================================================
# MODEL TRAINING
# ============================================================================
# Initialize Random Forest Classifier with 100 decision trees
rf_model = RandomForestClassifier(n_estimators=100)
# Train the model on the training data
rf_model.fit(X_train, y_train)
# ============================================================================
# MODEL EVALUATION
# ============================================================================
# Generate predictions on the test set and display classification metrics
# Includes precision, recall, f1-score, and support for each class
print(classification_report(y_test, rf_model.predict(X_test)))

Whether you need to export a model depends entirely on your inferencing strategy:

You DON’T need to export if:

You DO need to export if:

  • Training happens in one environment (e.g., cloud ML platform, data scientist’s workstation) but inference runs elsewhere (e.g., on IBM i, adjacent Linux server, or edge device)
  • You need to version and archive models for compliance, reproducibility, or rollback capability
  • Multiple applications or services need to load the same trained model
  • You’re deploying to a production environment separate from your training environment

When export is necessary, serialize the model using a format appropriate for your inference runtime:

  • ONNX — Open Neural Network Exchange format, supported by many runtimes including those available on IBM i via the Python Ecosystem for IBM Power and Red Hat OpenShift AI. Best for cross-platform deployment and when using different frameworks for training vs. inference.
  • Pickle / joblib — Standard Python serialization, suitable when inference also runs in Python
  • PMML — XML-based format for classical ML models, with broad tooling support

For production-grade model serving with versioning, monitoring, and auto-scaling, consider platforms like Red Hat OpenShift AI, Red Hat AI Inference Server, Wallaroo AI Platform, or watsonx.ai deployments, which handle model packaging and deployment automatically.

import joblib
joblib.dump(model, "churn_model.pkl")

If inference will run on IBM i directly, copy the serialized model file to IFS and load it in a Python PASE environment.

Once a model is trained and exported, it can score new records. Inference can run in several deployment locations:

REST APIs provide a universal integration pattern that works with all inference deployment options. Whether the model runs on IBM i PASE, an adjacent server, or a managed ML platform, exposing it via HTTP makes it accessible from any IBM i application.

# Load model and score new records
import joblib
import pandas as pd
from mapepire_python import connect
# ============================================================================
# LOAD TRAINED MODEL AND ENCODER
# ============================================================================
# Load the trained model
model = joblib.load("churn_model.pkl")
# Load the label encoder used during training
le = joblib.load("label_encoder.pkl")
# ============================================================================
# FETCH NEW DATA
# ============================================================================
# Connect to Db2 for i via Mapepire and retrieve new records for prediction
with connect("./config.ini") as conn:
with conn.execute("select * from demo.orders") as cursor:
results = cursor.fetchall()
df = pd.DataFrame(results['data'])
# ============================================================================
# PREPARE FEATURES FOR PREDICTION
# ============================================================================
# Select the same features used during training
X_new = df[["ORDER_AMOUNT", "DAYS_SINCE_LAST_ORDER", "CUSTOMER_SEGMENT"]]
# Encode the categorical variable using the SAVED encoder
X_new['CUSTOMER_SEGMENT'] = le.transform(X_new['CUSTOMER_SEGMENT'])
# ============================================================================
# MAKE PREDICTIONS
# ============================================================================
df["CHURN_SCORE"] = model.predict_proba(X_new)[:, 1]
# Write predictions back to Db2 for i...

RPG programs can invoke ML inference in several ways:

Option 1: Call a Python script using UNIXCMD

The UNIXCMD project provides a robust way to execute PASE commands from RPG.

Option 2: Call a REST API using IBM i HTTP APIs

The IBM i QhttpClnt APIs or the SYSTOOLS.HTTPGETCLOB / SYSTOOLS.HTTPPOSTCLOB SQL functions can call a Python scoring service and parse the JSON response.

-- Call a REST scoring endpoint from SQL
SELECT SYSTOOLS.HTTPPOSTCLOB(
'http://scoreserver:8080/predict',
'{"Content-Type":"application/json"}',
'{"order_amount": 1500, "days_since_last_order": 45, "segment": "B"}'
) AS PREDICTION
FROM SYSIBM.SYSDUMMY1;

Option 3: Db2 for i User-Defined Functions (UDFs)

For tighter integration, a Python-based scoring function will be able to be wrapped as an external UDF in Db2 for i using Db2 for i AI SDK, allowing it to be called directly in SQL queries — scoring every row in a result set in a single statement. This capability is not yet implemented and will be supported in a future release.