DPH Services Overview#
The Data Product Hub Services module provides a comprehensive Python SDK for managing data products in IBM Data Product Hub.
Architecture#
The module is built on top of the IBM Cloud SDK Core and provides:
Type-safe API: Full type hints for better IDE support
Error handling: Comprehensive exception handling with detailed error messages
Pagination support: Built-in pagination for large result sets
Authentication: Seamless integration with IBM Cloud authentication
Core Components#
Container Management#
Containers are the foundation of Data Product Hub, providing:
Delivery method configurations
Sample data products
Domain structures
Service credentials
Data Products#
Data products represent packaged data assets with:
Metadata and descriptions
Version control
Asset references
Domain associations
Contract terms
Drafts and Releases#
Drafts: Work-in-progress versions that can be edited and updated
Releases: Published versions that are immutable and available for consumption
Contract Terms#
Legal and business terms governing data product usage:
Terms and conditions documents
Service level agreements
Usage restrictions
Compliance requirements
Domains#
Organizational structure for categorizing data products:
Top-level domains (e.g., Finance, Marketing)
Subdomains for finer categorization
Multi-industry domain support
Data Flow#
Initialize Container: Set up the data product hub environment
Create Draft: Define a new data product version
Add Contract Terms: Attach legal and business terms
Publish Draft: Convert draft to a release
Manage Lifecycle: Update, version, or retire releases
Authentication#
The module supports multiple authentication methods:
IAM Authentication (Recommended)
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('your-api-key')
dph_service = DphV1(authenticator=authenticator)
Bearer Token Authentication
from ibm_cloud_sdk_core.authenticators import BearerTokenAuthenticator
authenticator = BearerTokenAuthenticator('your-bearer-token')
dph_service = DphV1(authenticator=authenticator)
Cloud Pak for Data Authentication
from ibm_cloud_sdk_core.authenticators import CloudPakForDataAuthenticator
authenticator = CloudPakForDataAuthenticator(
username='your-username',
password='your-password',
url='https://your-cpd-instance.com'
)
dph_service = DphV1(authenticator=authenticator)
Error Handling#
The SDK uses standard IBM Cloud SDK exceptions:
from ibm_cloud_sdk_core import ApiException
try:
response = dph_service.get_data_product(data_product_id='invalid-id')
except ApiException as e:
print(f"Error Code: {e.code}")
print(f"Error Message: {e.message}")
print(f"HTTP Status: {e.http_response.status_code}")
Common error codes:
400: Bad Request - Invalid parameters
401: Unauthorized - Authentication failed
403: Forbidden - Insufficient permissions
404: Not Found - Resource doesn’t exist
409: Conflict - Resource already exists
500: Internal Server Error - Service error
Best Practices#
Use Pagination for Large Datasets
# Use pager for automatic pagination
pager = dph_service.list_data_products_with_pager(limit=50)
for page in pager:
for product in page['data_products']:
process_product(product)
Implement Retry Logic
import time
from ibm_cloud_sdk_core import ApiException
def create_with_retry(dph_service, drafts, max_retries=3):
for attempt in range(max_retries):
try:
return dph_service.create_data_product(drafts=drafts)
except ApiException as e:
if e.code == 429 or e.code >= 500: # Rate limit or server error
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
raise
Validate Before Publishing
def validate_draft(draft):
"""Validate draft before publishing"""
required_fields = ['name', 'version', 'asset', 'domain']
for field in required_fields:
if field not in draft or not draft[field]:
raise ValueError(f"Missing required field: {field}")
return True
Use JSON Patch for Updates
# Efficient updates using JSON Patch
patch_operations = [
{'op': 'replace', 'path': '/description', 'value': 'Updated description'},
{'op': 'add', 'path': '/tags/-', 'value': 'new-tag'}
]
dph_service.update_data_product(
data_product_id=product_id,
json_patch_instructions=patch_operations
)
Performance Considerations#
Batch Operations
When creating multiple data products, consider batching to reduce API calls:
# Create multiple drafts in a single data product
dph_service.create_data_product(
drafts=[draft1, draft2, draft3]
)
Caching
Cache frequently accessed data to reduce API calls:
from functools import lru_cache
@lru_cache(maxsize=100)
def get_domain_cached(domain_id):
return dph_service.get_domain(domain_id=domain_id)
Parallel Processing
Use concurrent requests for independent operations:
from concurrent.futures import ThreadPoolExecutor
def get_product(product_id):
return dph_service.get_data_product(data_product_id=product_id)
with ThreadPoolExecutor(max_workers=5) as executor:
products = list(executor.map(get_product, product_ids))
Requirements#
Python 3.8 or higher
ibm-cloud-sdk-core >= 3.16.7
requests >= 2.32.4
python-dateutil >= 2.5.3
See Also#
Usage Guide - Detailed usage guide
Examples - Code examples
DPH Services API - API reference