IBM watsonx.data intelligence SDK for Python

Section Contents

IBM watsonx.data intelligence SDK for Python#

The IBM watsonx.data intelligence SDK for Python is a comprehensive toolkit for data intelligence operations, providing modular components for data quality validation, data product management, ODCS generation, and intelligent recommendations.

This SDK is designed with a modular architecture, allowing different teams to contribute specialized functionality while sharing common components like authentication. Currently, the SDK includes:

  • Common Modules: Shared authentication and configuration for all SDK modules

  • DQ Validator: In-memory data quality validation for streaming data, Pandas DataFrames, and PySpark DataFrames

  • DPH Services: Python client for IBM Data Product Hub API

  • ODCS Generator: Generate Open Data Contract Standard files from data catalogs

  • Data Product Recommender: Analyze query logs to identify high-value data products

The IBM watsonx.data intelligence SDK is supported on Python 3.8+.

Key Features#

Data Quality Validation

Comprehensive validation framework with 9 check types, support for array-based records and DataFrames, and integration with IBM Cloud Pak for Data.

Data Product Hub Integration

Complete Python SDK for managing data products, drafts, releases, contract terms, and domains.

ODCS Generation

Automated generation of ODCS v3.1.0 compliant YAML files from Collibra and Informatica catalogs.

Intelligent Recommendations

Query log analysis to identify high-value tables and logical groupings for data product prioritization.

Multi-Environment Authentication

Unified authentication supporting IBM Cloud, AWS Cloud, Government Cloud, and on-premises deployments.

Modular Architecture

Extensible design allowing teams to add new modules while sharing common functionality.

Type Safety

Full type hints throughout the SDK for better IDE support and code quality.