Best Practices¶
Overview¶
The ibm-watsonx-ai
Python SDK provides convenient access to IBM watsonx.ai services, including foundation models, training, deployment, and inference. It is designed to accelerate AI development and enable interaction with foundation models via a Pythonic interface.
Best Practices¶
Use the Latest Version
Always install the latest version of the SDK to benefit from new features, performance improvements, and security patches.
pip install --upgrade ibm-watsonx-ai
Authentication
There are various configurations for authenticating within the ibm-watsonx-ai
SDK. To get started, review the examples provided for both IBM watsonx.ai for IBM Cloud and IBM watsonx.ai software solutions.
Note
When using the SaaS solution, ensure you use the endpoint dedicated to the region where your project or space is created. This is crucial for proper functionality and to avoid any potential issues related to regional restrictions or performance.
Check available LLM models
The variety of LLM models may differ on each SaaS region and CPD installation. To quickly check what models are available the following Enums can be used:
models with support for chat:
api_client.foundation_models.ChatModels
Note
You can use the show()
method in the above Enum to get the key-value pairs of available chat models.
api_client.foundation_models.ChatModels.show()
The Enums can be easily converted to list:
list(api_client.foundation_models.ChatModels)
models with support for text generation:
api_client.foundation_models.TextModels
embedding models:
api_client.foundation_models.EmbeddingModels
models that can be used for reranking:
api_client.foundation_models.RerankModels
models that can be used for time series forecasting:
api_client.foundation_models.TimeSeriesModels
Use ModelInference and TSModelInference Interfaces Efficiently
When interacting with foundation models, initialize the appropriate client once and reuse it across inference calls to avoid redundant setup.
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
client = APIClient(credentials, project_id="your_project_id")
model_granite = ModelInference(
model_id="ibm/granite-3-3-8b-instruct",
api_client=client)
model_llama = ModelInference(
model_id="meta-llama/llama-3-3-70b-instruct",
api_client=client)
When working on an approach of calling the models in a loop, keep the initialization of ModelInference
outside the loop in order to limit the number of requests send in iteration and remove delays.
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
client = APIClient(credentials, project_id="your_project_id")
model_granite = ModelInference(
model_id="ibm/granite-3-3-8b-instruct",
api_client=client)
responses = []
for messages in list_of_chat_messages:
# no class initializations in the loop
responses.append(model_granite.chat(messages))
Use extras installation for RAG
The modules for RAG ibm_watsonx_ai.foundation_models.extensions.rag
and for reading documents: ibm_watsonx_ai.data_loaders.datasets.documents.DocumentsIterableDataset
requires additional packages. All required dependencies can be installed with [rag] option:
!pip install -U ibm-watsonx-ai[rag]
6. Follow Rate Limits and Quotas Respect service usage limits to avoid throttling or denial of service. Implement retry logic where appropriate.
Look for information in the documentation
Refer to the comprehensive SDK docs for examples, API references, and configuration guidance:
This documentation - use “search” window if needed
Enable Logging for Debugging Use Python’s built-in logging module to trace SDK activity, especially during development and troubleshooting.
import logging logging.basicConfig(level=logging.DEBUG)
Catch ApiRequestFailure exception
Catch exceptions related to API requests because network operations are inherently unreliable.
- Thanks to that, you can:
Avoid application crashes
Provide meaningful error messages
Implement fallback logic
Control flow
from ibm_watsonx_ai.wml_client_error import ApiRequestFailure
try:
deployment_details = client.deployments.create(model_asset_id, meta_props)
except ApiRequestFailure as e:
# Handle API request failure without breaking the application.
# Logging provides useful context for debugging.
logger.debug(f"API request failed with status code {e.response.status_code}, details: {e}")
Configure httpx.Client or httpx.AsyncClient for better performance
By default, httpx
manages connection pooling automatically. However, explicitly providing your own httpx.Limits
or httpx.Timeout
configuration
is often a better choice because it allows you to control resource usage and improve application stability under load.
Details about APIClient
with httpx
configuration can be found here:
Configuring the HTTP Client
Sample with limitations:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.utils.utils import HttpClientConfig
import httpx
limits = httpx.Limits(
max_connections=5
)
timeout = httpx.Timeout(7)
http_config = HttpClientConfig(timeout=timeout, limits=limits)
client = APIClient(credentials, httpx_client=http_config, async_httpx_client=http_config)
Use asynchronous methods
Note
The APIClient
allows the user to operate in synchronous and asynchronous applications.
If you need to speed up your application, we suggest using asynchronous methods:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
client = APIClient(credentials, project_id="your_project_id")
model = ModelInference(
model_id="ibm/granite-3-3-8b-instruct",,
api_client=client,
)
messages = [
{"role": "user", "content": "What is 1 + 1"},
]
response = await model.achat(messages=messages)
Change space/project after promotion
We distinguish two types of working environments: project and space.
A project is a collaborative workspace where you work with data and other assets to achieve a specific goal. A space, on the other hand, is used to deploy various assets and manage your deployments.
Set your client with specified project/space, if for example you want to gather data via DataConnection
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.helpers import DataConnection
client = APIClient(credentials, project_id="your_project_id")
data_connection = DataConnection(data_asset_id="your_asset_id")
data_connection.set_client(client)
data = data_connection.read()
After creating a deployment or promoting a resource (such as a notebook, model, or other asset) from a project to a space, you must switch the working environment to the corresponding space_id in order to access it.
# publish asset from project to space
promoted_asset_id = client.spaces.promote("your_asset_id",
source_project_id="your_project_id",
target_space_id="your_space_id"
)
client.set.default_space(space_id="your_space_id")
data_connection = DataConnection(data_asset_id=promoted_asset_id)
data_connection.set_client(client)
Avoid setting the verify flag directly on the client
Instead of adding verify to APIClient
pass it to the Credentials
object:
from ibm_watsonx_ai import Credentials
credentials = Credentials(verify=...)
Avoid using uid
We suggest using the latest naming convention, i.e. just id
instead of uid
.
For example, the get_uid
and get_job_uid
methods are deprecated, it is better to use the recommended get_id
/ get_job_id
.
client.deployments.get_id(deployment_details)
Using in multithreaded applications
When using APIClient
in a multi-threaded environment, ensure that the client is initialized only once.
from ibm_watsonx_ai import APIClient
client = APIClient(credentials, project_id="your_project_id")
payload = [(deployment_id, scoring_payload)]
with ThreadPoolExecutor(max_workers=n) as exec:
response = list(exec.map(lambda args: client.deployments.score(*args), payload))
In this example, the same APIClient
instance (client) is shared across threads.
Customize HTTPX logging with event hooks
Because HTTPX
allows for “event hooks” to be registered on the client, we can see calls whenever a specific type of event occurs.
def log_request(request):
print(f"Request event hook: {request.method} {request.url} - Waiting for response")
def log_response(response):
request = response.request
print(f"Response event hook: {request.method} {request.url} - Status {response.status_code}")
client = APIClient(credentials, project_id="your_project_id")
client.httpx_client.event_hooks['request'] = [log_request]
client.httpx_client.event_hooks['response'] = [log_response]
More details in the official documentation: Event Hooks
Access the token via the client
Note
Never hardcode tokens directly in source code or notebooks. Store them in environment variables or use a secure secrets manager.
Most methods in ibm_watsonx_ai
library require authentication to access secured APIs or private resources.
An authentication token is used to securely identify the user or application making the request.
If you have an initialized APIClient
, you can easily access it’s token:
from ibm_watsonx_ai import APIClient
client = APIClient(credentials, project_id="your_project_id")
token = client.token