Best Practices ============== Overview -------- The ``ibm-watsonx-ai`` Python SDK provides convenient access to IBM watsonx.ai services, including foundation models, training, deployment, and inference. It is designed to accelerate AI development and enable interaction with foundation models via a Pythonic interface. Best Practices -------------- 1. **Use the Latest Version** Always install the latest version of the SDK to benefit from new features, performance improvements, and security patches. .. code-block:: bash pip install --upgrade ibm-watsonx-ai 2. **Authentication** There are various configurations for authenticating within the ``ibm-watsonx-ai`` SDK. To get started, review the examples provided for both :ref:`IBM watsonx.ai for IBM Cloud` and :ref:`IBM watsonx.ai software` solutions. .. note:: When using the SaaS solution, ensure you use the endpoint dedicated to the region where your project or space is created. This is crucial for proper functionality and to avoid any potential issues related to regional restrictions or performance. 3. **Check available LLM models** The variety of LLM models may differ on each SaaS region and CPD installation. To quickly check what models are available the following Enums can be used: - models with support for chat: .. code-block:: python api_client.foundation_models.ChatModels .. note:: You can use the ``show()`` method in the above Enum to get the key-value pairs of available chat models. .. code-block:: python api_client.foundation_models.ChatModels.show() The Enums can be easily converted to list: .. code-block:: python list(api_client.foundation_models.ChatModels) - models with support for text generation: .. code-block:: python api_client.foundation_models.TextModels - embedding models: .. code-block:: python api_client.foundation_models.EmbeddingModels - models that can be used for reranking: .. code-block:: python api_client.foundation_models.RerankModels - models that can be used for time series forecasting: .. code-block:: python api_client.foundation_models.TimeSeriesModels 4. **Use ModelInference and TSModelInference Interfaces Efficiently** When interacting with foundation models, initialize the appropriate client once and reuse it across inference calls to avoid redundant setup. .. code-block:: python from ibm_watsonx_ai import APIClient from ibm_watsonx_ai.foundation_models import ModelInference client = APIClient(credentials, project_id="your_project_id") model_granite = ModelInference( model_id="ibm/granite-3-3-8b-instruct", api_client=client) model_llama = ModelInference( model_id="meta-llama/llama-3-3-70b-instruct", api_client=client) When working on an approach of calling the models in a loop, keep the initialization of ``ModelInference`` outside the loop in order to limit the number of requests send in iteration and remove delays. .. code-block:: python from ibm_watsonx_ai import APIClient from ibm_watsonx_ai.foundation_models import ModelInference client = APIClient(credentials, project_id="your_project_id") model_granite = ModelInference( model_id="ibm/granite-3-3-8b-instruct", api_client=client) responses = [] for messages in list_of_chat_messages: # no class initializations in the loop responses.append(model_granite.chat(messages)) 5. **Use extras installation for RAG** The modules for RAG ``ibm_watsonx_ai.foundation_models.extensions.rag`` and for reading documents: ``ibm_watsonx_ai.data_loaders.datasets.documents.DocumentsIterableDataset`` requires additional packages. All required dependencies can be installed with `[rag]` option: .. code-block:: bash !pip install -U ibm-watsonx-ai[rag] 6. **Follow Rate Limits and Quotas** Respect service usage limits to avoid throttling or denial of service. Implement retry logic where appropriate. 7. **Look for information in the documentation** Refer to the comprehensive SDK docs for examples, API references, and configuration guidance: - `This documentation `_ - use "search" window if needed - `API documentations `_ 8. **Enable Logging for Debugging** Use Python’s built-in logging module to trace SDK activity, especially during development and troubleshooting. .. code-block:: python import logging logging.basicConfig(level=logging.DEBUG) 9. **Catch ApiRequestFailure exception** Catch exceptions related to API requests because network operations are inherently unreliable. Thanks to that, you can: 1. Avoid application crashes 2. Provide meaningful error messages 3. Implement fallback logic 4. Control flow .. code-block:: python from ibm_watsonx_ai.wml_client_error import ApiRequestFailure try: deployment_details = client.deployments.create(model_asset_id, meta_props) except ApiRequestFailure as e: # Handle API request failure without breaking the application. # Logging provides useful context for debugging. logger.debug(f"API request failed with status code {e.response.status_code}, details: {e}") 10. **Configure httpx.Client or httpx.AsyncClient for better performance** By default, ``httpx`` manages connection pooling automatically. However, explicitly providing your own ``httpx.Limits`` or ``httpx.Timeout`` configuration is often a better choice because it allows you to control resource usage and improve application stability under load. Details about ``APIClient`` with ``httpx`` configuration can be found here: `Configuring the HTTP Client `_ Sample with limitations: .. code-block:: python from ibm_watsonx_ai import APIClient from ibm_watsonx_ai.utils.utils import HttpClientConfig import httpx limits = httpx.Limits( max_connections=5 ) timeout = httpx.Timeout(7) http_config = HttpClientConfig(timeout=timeout, limits=limits) client = APIClient(credentials, httpx_client=http_config, async_httpx_client=http_config) 11. **Use asynchronous methods** .. note:: The ``APIClient`` allows the user to operate in synchronous and asynchronous applications. If you need to speed up your application, we suggest using asynchronous methods: .. code-block:: python from ibm_watsonx_ai import APIClient from ibm_watsonx_ai.foundation_models import ModelInference client = APIClient(credentials, project_id="your_project_id") model = ModelInference( model_id="ibm/granite-3-3-8b-instruct",, api_client=client, ) messages = [ {"role": "user", "content": "What is 1 + 1"}, ] response = await model.achat(messages=messages) 12. **Change space/project after promotion** We distinguish two types of working environments: project and space. A project is a collaborative workspace where you work with data and other assets to achieve a specific goal. A space, on the other hand, is used to deploy various assets and manage your deployments. Set your client with specified project/space, if for example you want to gather data via ``DataConnection`` .. code-block:: python from ibm_watsonx_ai import APIClient from ibm_watsonx_ai.helpers import DataConnection client = APIClient(credentials, project_id="your_project_id") data_connection = DataConnection(data_asset_id="your_asset_id") data_connection.set_client(client) data = data_connection.read() After creating a deployment or promoting a resource (such as a notebook, model, or other asset) from a project to a space, you must switch the working environment to the corresponding space_id in order to access it. .. code-block:: python # publish asset from project to space promoted_asset_id = client.spaces.promote("your_asset_id", source_project_id="your_project_id", target_space_id="your_space_id" ) client.set.default_space(space_id="your_space_id") data_connection = DataConnection(data_asset_id=promoted_asset_id) data_connection.set_client(client) 13. **Avoid setting the verify flag directly on the client** Instead of adding verify to ``APIClient`` pass it to the ``Credentials`` object: .. code-block:: python from ibm_watsonx_ai import Credentials credentials = Credentials(verify=...) 14. **Avoid using uid** We suggest using the latest naming convention, i.e. just ``id`` instead of ``uid``. For example, the ``get_uid`` and ``get_job_uid`` methods are deprecated, it is better to use the recommended ``get_id`` / ``get_job_id``. .. code-block:: python client.deployments.get_id(deployment_details) 15. **Using in multithreaded applications** When using ``APIClient`` in a multi-threaded environment, ensure that the client is initialized only once. .. code-block:: python from ibm_watsonx_ai import APIClient client = APIClient(credentials, project_id="your_project_id") payload = [(deployment_id, scoring_payload)] with ThreadPoolExecutor(max_workers=n) as exec: response = list(exec.map(lambda args: client.deployments.score(*args), payload)) In this example, the same ``APIClient`` instance (client) is shared across threads. 16. **Customize HTTPX logging with event hooks** Because ``HTTPX`` allows for "event hooks" to be registered on the client, we can see calls whenever a specific type of event occurs. .. code-block:: python def log_request(request): print(f"Request event hook: {request.method} {request.url} - Waiting for response") def log_response(response): request = response.request print(f"Response event hook: {request.method} {request.url} - Status {response.status_code}") client = APIClient(credentials, project_id="your_project_id") client.httpx_client.event_hooks['request'] = [log_request] client.httpx_client.event_hooks['response'] = [log_response] More details in the official documentation: `Event Hooks `_ 17. **Access the token via the client** .. note:: **Never hardcode** tokens directly in source code or notebooks. Store them in environment variables or use a secure secrets manager. Most methods in ``ibm_watsonx_ai`` library require authentication to access secured APIs or private resources. An authentication token is used to securely identify the user or application making the request. If you have an initialized ``APIClient``, you can easily access it's token: .. code-block:: python from ibm_watsonx_ai import APIClient client = APIClient(credentials, project_id="your_project_id") token = client.token