Best Practices

Overview

The ibm-watsonx-ai Python SDK provides convenient access to IBM watsonx.ai services, including foundation models, training, deployment, and inference. It is designed to accelerate AI development and enable interaction with foundation models via a Pythonic interface.

Best Practices

  1. Use the Latest Version

Always install the latest version of the SDK to benefit from new features, performance improvements, and security patches.

pip install --upgrade ibm-watsonx-ai
  1. Authentication

There are various configurations for authenticating within the ibm-watsonx-ai SDK. To get started, review the examples provided for both IBM watsonx.ai for IBM Cloud and IBM watsonx.ai software solutions.

Note

When using the SaaS solution, ensure you use the endpoint dedicated to the region where your project or space is created. This is crucial for proper functionality and to avoid any potential issues related to regional restrictions or performance.

  1. Check available LLM models

The variety of LLM models may differ on each SaaS region and CPD installation. To quickly check what models are available the following Enums can be used:

  • models with support for chat:

api_client.foundation_models.ChatModels

Note

You can use the show() method in the above Enum to get the key-value pairs of available chat models.

api_client.foundation_models.ChatModels.show()

The Enums can be easily converted to list:

list(api_client.foundation_models.ChatModels)
  • models with support for text generation:

api_client.foundation_models.TextModels
  • embedding models:

api_client.foundation_models.EmbeddingModels
  • models that can be used for reranking:

api_client.foundation_models.RerankModels
  • models that can be used for time series forecasting:

api_client.foundation_models.TimeSeriesModels
  1. Use ModelInference and TSModelInference Interfaces Efficiently

When interacting with foundation models, initialize the appropriate client once and reuse it across inference calls to avoid redundant setup.

from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference

client = APIClient(credentials, project_id="your_project_id")

model_granite = ModelInference(
                    model_id="ibm/granite-3-3-8b-instruct",
                    api_client=client)

model_llama = ModelInference(
                    model_id="meta-llama/llama-3-3-70b-instruct",
                    api_client=client)

When working on an approach of calling the models in a loop, keep the initialization of ModelInference outside the loop in order to limit the number of requests send in iteration and remove delays.

from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference

client = APIClient(credentials, project_id="your_project_id")

model_granite = ModelInference(
                    model_id="ibm/granite-3-3-8b-instruct",
                    api_client=client)


responses = []
for messages in list_of_chat_messages:
    # no class initializations in the loop
    responses.append(model_granite.chat(messages))
  1. Use extras installation for RAG

The modules for RAG ibm_watsonx_ai.foundation_models.extensions.rag and for reading documents: ibm_watsonx_ai.data_loaders.datasets.documents.DocumentsIterableDataset requires additional packages. All required dependencies can be installed with [rag] option:

!pip install -U ibm-watsonx-ai[rag]

6. Follow Rate Limits and Quotas Respect service usage limits to avoid throttling or denial of service. Implement retry logic where appropriate.

  1. Look for information in the documentation

Refer to the comprehensive SDK docs for examples, API references, and configuration guidance:

  1. Enable Logging for Debugging Use Python’s built-in logging module to trace SDK activity, especially during development and troubleshooting.

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
  2. Catch ApiRequestFailure exception

Catch exceptions related to API requests because network operations are inherently unreliable.

Thanks to that, you can:
  1. Avoid application crashes

  2. Provide meaningful error messages

  3. Implement fallback logic

  4. Control flow

from ibm_watsonx_ai.wml_client_error import ApiRequestFailure

try:
    deployment_details = client.deployments.create(model_asset_id, meta_props)
except ApiRequestFailure as e:
    # Handle API request failure without breaking the application.
    # Logging provides useful context for debugging.
    logger.debug(f"API request failed with status code {e.response.status_code}, details: {e}")
  1. Configure httpx.Client or httpx.AsyncClient for better performance

By default, httpx manages connection pooling automatically. However, explicitly providing your own httpx.Limits or httpx.Timeout configuration is often a better choice because it allows you to control resource usage and improve application stability under load.

Details about APIClient with httpx configuration can be found here: Configuring the HTTP Client

Sample with limitations:

from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.utils.utils import HttpClientConfig
import httpx

limits = httpx.Limits(
    max_connections=5
)
timeout = httpx.Timeout(7)
http_config = HttpClientConfig(timeout=timeout, limits=limits)
client = APIClient(credentials, httpx_client=http_config, async_httpx_client=http_config)
  1. Use asynchronous methods

Note

The APIClient allows the user to operate in synchronous and asynchronous applications.

If you need to speed up your application, we suggest using asynchronous methods:

from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference

client = APIClient(credentials, project_id="your_project_id")

model = ModelInference(
    model_id="ibm/granite-3-3-8b-instruct",,
    api_client=client,
)

messages = [
    {"role": "user", "content": "What is 1 + 1"},
]
response = await model.achat(messages=messages)
  1. Change space/project after promotion

We distinguish two types of working environments: project and space.

A project is a collaborative workspace where you work with data and other assets to achieve a specific goal. A space, on the other hand, is used to deploy various assets and manage your deployments.

Set your client with specified project/space, if for example you want to gather data via DataConnection

from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.helpers import DataConnection

client = APIClient(credentials, project_id="your_project_id")

data_connection = DataConnection(data_asset_id="your_asset_id")
data_connection.set_client(client)

data = data_connection.read()

After creating a deployment or promoting a resource (such as a notebook, model, or other asset) from a project to a space, you must switch the working environment to the corresponding space_id in order to access it.

# publish asset from project to space
promoted_asset_id = client.spaces.promote("your_asset_id",
    source_project_id="your_project_id",
    target_space_id="your_space_id"
)

client.set.default_space(space_id="your_space_id")

data_connection = DataConnection(data_asset_id=promoted_asset_id)
data_connection.set_client(client)
  1. Avoid setting the verify flag directly on the client

Instead of adding verify to APIClient pass it to the Credentials object:

from ibm_watsonx_ai import Credentials

credentials = Credentials(verify=...)
  1. Avoid using uid

We suggest using the latest naming convention, i.e. just id instead of uid. For example, the get_uid and get_job_uid methods are deprecated, it is better to use the recommended get_id / get_job_id.

client.deployments.get_id(deployment_details)
  1. Using in multithreaded applications

When using APIClient in a multi-threaded environment, ensure that the client is initialized only once.

from ibm_watsonx_ai import APIClient

client = APIClient(credentials, project_id="your_project_id")

payload = [(deployment_id, scoring_payload)]

with ThreadPoolExecutor(max_workers=n) as exec:
    response = list(exec.map(lambda args: client.deployments.score(*args), payload))

In this example, the same APIClient instance (client) is shared across threads.

  1. Customize HTTPX logging with event hooks

Because HTTPX allows for “event hooks” to be registered on the client, we can see calls whenever a specific type of event occurs.

def log_request(request):
    print(f"Request event hook: {request.method} {request.url} - Waiting for response")

def log_response(response):
    request = response.request
    print(f"Response event hook: {request.method} {request.url} - Status {response.status_code}")

client = APIClient(credentials, project_id="your_project_id")

client.httpx_client.event_hooks['request'] = [log_request]
client.httpx_client.event_hooks['response'] = [log_response]

More details in the official documentation: Event Hooks

  1. Access the token via the client

Note

Never hardcode tokens directly in source code or notebooks. Store them in environment variables or use a secure secrets manager.

Most methods in ibm_watsonx_ai library require authentication to access secured APIs or private resources. An authentication token is used to securely identify the user or application making the request.

If you have an initialized APIClient, you can easily access it’s token:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials, project_id="your_project_id")

token = client.token