Skip to content

watsonx.data

Solution report card
Runs on IBM i?
On-prem
IBM Cloud
AI capabilitiesApache Spark
Agentic AI
many more…
Commercial support
Free to try?
Requirements

watsonx.data is IBM’s data lakehouse platform — designed to consolidate data from multiple sources into a single, governed, queryable layer. It combines the flexibility of a data lake with the structure of a data warehouse, using open formats (Apache Iceberg, Parquet) and open query engines (Presto/Trino, Apache Spark) to avoid vendor lock-in.

For IBM i environments, watsonx.data serves as a data federation and analytics layer that can query Db2 for i data alongside data from other systems — cloud object storage, relational databases, streaming platforms — using a single SQL interface.

watsonx.data’s Presto/Trino engine can query Db2 for i directly via the IBM Db2 for i connector, without requiring data migration. This enables cross-system analytics queries that join IBM i data with data from cloud storage, other databases, or Kafka topics.

A managed Spark environment within watsonx.data enables large-scale data processing, feature engineering for ML pipelines, and ETL workflows — all operating over data that includes IBM i Db2 tables.

Data registered in watsonx.data using Apache Iceberg is accessible to any tool that understands the format — including watsonx.ai for model training, BI tools like Cognos, and external data science environments.

watsonx.data integrates with IBM’s governance capabilities (via watsonx.governance) to enforce data access policies, track data lineage, and maintain a business glossary — important for regulated industries where IBM i is common.

See Accessing Db2 from watsonx.data for step-by-step instructions, and IBM Cloud Satellite Connector for connecting an on-premises IBM i to watsonx.data running in IBM Cloud.