Model Gateway (BETA)

Note

Model Gateway is in currently in beta stage and available only on IBM watsonx.ai for IBM Cloud. Breaking changes in API may be introduced in the future.

Model Gateway provides proxy for inference requests to many model providers. The feature contain easy model usage with load balancing.

Gateway

class ibm_watsonx_ai.gateway.Gateway(*, credentials=None, verify=None, api_client=None)[source]

Model Gateway class.

Providers

class ibm_watsonx_ai.gateway.providers.Providers(api_client)[source]

Model Gateway providers class.

create(provider, name, data=None, secret_crn_id=None)[source]

Create provider in Model Gateway.

Parameters:
  • provider (str) – provider name

  • name (str) – name of provider for display

  • data (dict, optional) – data required to connect to provider api

  • secret_crn_id (str, optional) – crn of secret for given provider in the Secrets Manager

Returns:

provider details

Return type:

dict

delete(provider_id)[source]

Delete provider.

Parameters:

provider_id (str) – unique provider ID

Returns:

status (“SUCCESS” if succeeded)

Return type:

str

get_available_models_details(provider_id)[source]

Get available models details for given provider.

Parameters:

provider_id (str) – unique provider ID

Returns:

details of available models for provider

Return type:

dict

get_details(provider_id=None)[source]
Get provider/providers details:
  • provider_id is set - details for given provider are returned

  • provider_id is None - details for all providers are returned

Parameters:

provider_id (str, optional) – unique provider ID

Returns:

provider/providers details

Return type:

dict

static get_id(provider_details)[source]

Get provider ID from provider details.

Parameters:

provider_details (dict) – details of the provider in Model Gateway

Returns:

unique provider ID

Return type:

str

list()[source]

List providers.

Returns:

dataframe with providers details

Return type:

pandas.DataFrame

list_available_models(provider_id)[source]

List available models for provider.

Parameters:

provider_id (str) – unique provider ID

Returns:

dataframe with available models details

Return type:

pandas.DataFrame

Models

class ibm_watsonx_ai.gateway.models.Models(api_client)[source]

Model Gateway models class.

create(provider_id, model, alias=None, metadata=None)[source]

Register model in Model Gateway.

Parameters:
  • provider_id (str) – unique provider ID obtained from provider details

  • model (str) – model name as supported by provider

  • alias (str, optional) – alias for registered model, can be used later as model name during embeddings or text/chat completions calls

  • metadata (dict, optional) – additional metadata which can be added for the model

Returns:

model details

Return type:

dict

delete(model_id)[source]

Unregister model from Model Gateway.

Parameters:

model_id (str) – unique model ID obtained from model details

Returns:

status (“SUCCESS” if succeeded)

Return type:

str

get_details(*, model_id=None, provider_id=None)[source]
Get details of model or models:
  • model_id is set - details for single model are returned, provider_id if set is ignored

  • provider_id is set, model_id is None - details for all models for given provider are returned

  • both model_id and provider_id are None - all models details are returned

Parameters:
  • model_id (str, optional) – unique model ID

  • provider_id (str, optional) – unique provider ID, ignored if model_id is set

Returns:

details of model/models

Return type:

dict

static get_id(model_details)[source]

Get model ID from model details.

Parameters:

model_details (dict) – details of the model registered in Model Gateway

Returns:

unique model ID

Return type:

str

list(provider_id=None)[source]

List models registered in Model Gateway. List can be filtered by provider_id.

Parameters:

provider_id (str, optional) – ID of provider added into Model Gateway

Returns:

dataframe containing list results

Return type:

pandas.DataFrame

Policies

class ibm_watsonx_ai.gateway.policies.Policies(api_client)[source]

Model Gateway policies class.

create(action, resource, subject, effect=None)[source]

Create policy.

Parameters:
  • action (str) – action for policy

  • resource (str) – resource for policy

  • subject (str) – subject for policy

  • effect (str, optional) – effect for policy

delete(policy_id)[source]

Delete policy.

Parameters:

policy_id (str) – ID of policy

Returns:

status (“SUCCESS” if succeeded)

Return type:

str

get_details()[source]

Get policies details.

Returns:

policies details

Return type:

dict

static get_id(policy_details)[source]

Get policy ID from policy details.

Parameters:

policy_details (dict) – details of the policy for asset registered in Model Gateway

Returns:

unique policy ID

Return type:

str

list()[source]

List policies.

Returns:

dataframe with policies details

Return type:

pandas.DataFrame

RateLimits

class ibm_watsonx_ai.gateway.rate_limits.RateLimitSettings[source]

Model Gateway rate limit settings.

Parameters:
  • amount (int) – amount is the number of tokens refilled into the bucket each interval

  • capacity (int) – capacity is the maximum number of tokens (requests) the bucket can hold

  • duration (str) – duration is the refill interval, formatted as a Go duration string (for more information please see: https://pkg.go.dev/time#ParseDuration)

class ibm_watsonx_ai.gateway.rate_limits.RateLimits(api_client)[source]

Model Gateway rate limits class.

create_for_model(model_id, *, request=None, token=None)[source]

Create rate limit for model in Model Gateway.

Parameters:
  • model_id (str) – ID of the Model Gateway model

  • request (RateLimitSettings, optional) – request rate limiting settings

  • token (RateLimitSettings, optional) – token rate limiting settings

Returns:

rate limit details

Return type:

dict

create_for_provider(provider_id, *, request=None, token=None)[source]

Create rate limit for provider in Model Gateway.

Parameters:
  • provider_id (str) – ID of the Model Gateway provider

  • request (RateLimitSettings, optional) – request rate limiting settings

  • token (RateLimitSettings, optional) – token rate limiting settings

Returns:

rate limit details

Return type:

dict

create_for_tenant(*, request=None, token=None)[source]

Create rate limit for tenant in Model Gateway.

Parameters:
Returns:

rate limit details

Return type:

dict

delete(rate_limit_id)[source]

Delete rate limit from Model Gateway.

Parameters:

rate_limit_id (str) – ID of the rate limit

Returns:

status “SUCCESS” if deletion is successful

Return type:

Literal[“SUCCESS”]

Raises:

ApiRequestFailure if deletion failed

get_details(*, rate_limit_id=None)[source]

Get details of rate limits. If rate_limit_id is specified, returns details of that rate limit.

Parameters:

rate_limit_id (str, optional) – ID of the rate limit

Returns:

details of rate limits or rate limit if rate_limit_id is specified

Return type:

dict

static get_id(rate_limit_details)[source]

Get rate limit ID from rate limit details.

Parameters:

rate_limit_details (dict) – details of the rate limit

Returns:

ID of the rate limit

Return type:

str

list()[source]

List rate limits registered in Model Gateway.

Returns:

dataframe containing list results

Return type:

pandas.DataFrame

update_for_model(rate_limit_id, model_id, *, request=None, token=None)[source]

Update rate limit for model in Model Gateway.

Parameters:
  • rate_limit_id (str) – ID of the rate limit

  • model_id (str) – ID of the Model Gateway model

  • request (RateLimitSettings, optional) – request rate limiting settings

  • token (RateLimitSettings, optional) – token rate limiting settings

Returns:

rate limit details

Return type:

dict

update_for_provider(rate_limit_id, provider_id, *, request=None, token=None)[source]

Update rate limit for provider in Model Gateway.

Parameters:
  • rate_limit_id (str) – ID of the rate limit

  • provider_id (str) – ID of the Model Gateway provider

  • request (RateLimitSettings, optional) – request rate limiting settings

  • token (RateLimitSettings, optional) – token rate limiting settings

Returns:

rate limit details

Return type:

dict

update_for_tenant(rate_limit_id, *, request=None, token=None)[source]

Update rate limit for tenant in Model Gateway.

Parameters:
  • rate_limit_id (str) – ID of the rate limit

  • request (RateLimitSettings, optional) – request rate limiting settings

  • token (RateLimitSettings, optional) – token rate limiting settings

Returns:

rate limit details

Return type:

dict

Get rate limit details for model requests

In order to get details of a request, which returned an error because of rate limits, you should use try-except to catch the APIRequestFailure exception. The caught exception has the response property, which is the underlying httpx.Response instance. Using that instance, you can retrieve the response headers, which contain information about the rate limit.

try:
    response = gateway.completions.create(
        model_id, "The default voltage provided in USB is "
    )
except APIRequestFailure as exc:
    error_response = exc.response
    rate_limit_headers = {
        name, value
        for name, value in error_response.headers
        if name.startswith("x-ratelimit-")
    }