Deploy on Demand

IBM watsonx.ai makes a curated collection of foundation models available for you to deploy on-demand on dedicated hardware for the exclusive use of your organization. By using this approach, you can access the capabilities of these powerful foundation models without the need for extensive computational resources. Foundation models that you deploy on-demand are hosted in a dedicated deployment space where you can use these models for inferencing.

IBM watsonx.ai for IBM Cloud

This section shows how to create task credentials, store and deploy a model, and use the ModelInference module with the created deployment on IBM watsonx.ai for IBM Cloud.

Initialize an APIClient object

Initialize an APIClient object if needed. For more details about supported APIClient initialization, see Setup.

from ibm_watsonx_ai import APIClient

client = APIClient(credentials, project_id=project_id)
# or: client = APIClient(credentials, space_id=space_id)

Add Task Credentials

Warning

If not already added, Task Credentials are required on IBM watsonx.ai for IBM Cloud to make a deployment.

With task credentials, you can deploy a curated foundation model and avoid token expiration issues. For more details, see Adding task credentials.

To list available task credentials, use the list method:

client.task_credentials.list()

If the list is empty, you can create new task credentials with the store method:

client.task_credentials.store()

To get details of available task credentials, use the get_details method:

client.task_credentials.get_details()

Store the model in the service repository

To store a model as an asset in the repo, you must first create proper metadata.

metadata = {
    client.repository.ModelMetaNames.NAME: "curated FM asset",
    client.repository.ModelMetaNames.TYPE: client.repository.ModelAssetTypes.CURATED_FOUNDATION_MODEL_1_0,
}

Store the model

After creating the proper metadata, you can store the model using client.repository.store_model().

stored_model_details = client.repository.store_model(model='ibm/granite-13b-chat-v2-curated', meta_props=metadata)

To get the id of a stored model asset, use the obtained details.

stored_model_asset_id = client.repository.get_model_id(stored_model_details)

List all stored curated foundation models and filter them by framework type.

client.repository.list(framework_filter='curated_foundation_model_1.0')

Deploy the curated foundation model

To create a new deployment of a curated foundation model, you need to define a dictionary with deployment metadata: meta_props. Specify the deployment NAME and DESCRIPTION fields.

Only online deployments are supported, so the ONLINE field is required.

Optional: At this stage, you can overwrite model parameters. To overwrite them, pass a dictionary with new parameters values in the FOUNDATION_MODEL field.

You also need to provide the stored_model_asset_id to create the deployment.

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "curated_fm_deployment",
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "Testing deployment using curated foundation model",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SERVING_NAME: "test_curated_fm_01"
}
deployment_details = client.deployments.create(stored_model_asset_id, meta_props)

Once the deployment creation process is done, the create method returns a dictionary with the deployment details, which can be used to retrieve the deployment_id.

deployment_id = client.deployments.get_id(deployment_details)

You can list all existing deployments in the working space or project scope with the list method:

client.deployments.list()

Work with deployments

For information on working with foundation model deployments, see Models/ ModelInference for Deployments.