Model Gateway (BETA)¶
Note
Model Gateway is in currently in beta stage and available only on IBM watsonx.ai for IBM Cloud. Breaking changes in API may be introduced in the future.
Model Gateway provides proxy for inference requests to many model providers. The feature contain easy model usage with load balancing.
Gateway¶
Providers¶
- class ibm_watsonx_ai.gateway.providers.Providers(api_client)[source]¶
Model Gateway providers class.
- create(provider, name, data=None, secret_crn_id=None)[source]¶
Create provider in Model Gateway.
- Parameters:
provider (str) – provider name
name (str) – name of provider for display
data (dict, optional) – data required to connect to provider api
secret_crn_id (str, optional) – crn of secret for given provider in the Secrets Manager
- Returns:
provider details
- Return type:
dict
- delete(provider_id)[source]¶
Delete provider.
- Parameters:
provider_id (str) – unique provider ID
- Returns:
status (“SUCCESS” if succeeded)
- Return type:
str
- get_available_models_details(provider_id)[source]¶
Get available models details for given provider.
- Parameters:
provider_id (str) – unique provider ID
- Returns:
details of available models for provider
- Return type:
dict
- get_details(provider_id=None)[source]¶
- Get provider/providers details:
provider_id is set - details for given provider are returned
provider_id is None - details for all providers are returned
- Parameters:
provider_id (str, optional) – unique provider ID
- Returns:
provider/providers details
- Return type:
dict
- static get_id(provider_details)[source]¶
Get provider ID from provider details.
- Parameters:
provider_details (dict) – details of the provider in Model Gateway
- Returns:
unique provider ID
- Return type:
str
Models¶
- class ibm_watsonx_ai.gateway.models.Models(api_client)[source]¶
Model Gateway models class.
- create(provider_id, model, alias=None, metadata=None)[source]¶
Register model in Model Gateway.
- Parameters:
provider_id (str) – unique provider ID obtained from provider details
model (str) – model name as supported by provider
alias (str, optional) – alias for registered model, can be used later as model name during embeddings or text/chat completions calls
metadata (dict, optional) – additional metadata which can be added for the model
- Returns:
model details
- Return type:
dict
- delete(model_id)[source]¶
Unregister model from Model Gateway.
- Parameters:
model_id (str) – unique model ID obtained from model details
- Returns:
status (“SUCCESS” if succeeded)
- Return type:
str
- get_details(*, model_id=None, provider_id=None)[source]¶
- Get details of model or models:
model_id is set - details for single model are returned, provider_id if set is ignored
provider_id is set, model_id is None - details for all models for given provider are returned
both model_id and provider_id are None - all models details are returned
- Parameters:
model_id (str, optional) – unique model ID
provider_id (str, optional) – unique provider ID, ignored if model_id is set
- Returns:
details of model/models
- Return type:
dict
Policies¶
- class ibm_watsonx_ai.gateway.policies.Policies(api_client)[source]¶
Model Gateway policies class.
- create(action, resource, subject, effect=None)[source]¶
Create policy.
- Parameters:
action (str) – action for policy
resource (str) – resource for policy
subject (str) – subject for policy
effect (str, optional) – effect for policy
- delete(policy_id)[source]¶
Delete policy.
- Parameters:
policy_id (str) – ID of policy
- Returns:
status (“SUCCESS” if succeeded)
- Return type:
str
RateLimits¶
- class ibm_watsonx_ai.gateway.rate_limits.RateLimitSettings[source]¶
Model Gateway rate limit settings.
- Parameters:
amount (int) – amount is the number of tokens refilled into the bucket each interval
capacity (int) – capacity is the maximum number of tokens (requests) the bucket can hold
duration (str) – duration is the refill interval, formatted as a Go duration string (for more information please see: https://pkg.go.dev/time#ParseDuration)
- class ibm_watsonx_ai.gateway.rate_limits.RateLimits(api_client)[source]¶
Model Gateway rate limits class.
- create_for_model(model_id, *, request=None, token=None)[source]¶
Create rate limit for model in Model Gateway.
- Parameters:
model_id (str) – ID of the Model Gateway model
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
- create_for_provider(provider_id, *, request=None, token=None)[source]¶
Create rate limit for provider in Model Gateway.
- Parameters:
provider_id (str) – ID of the Model Gateway provider
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
- create_for_tenant(*, request=None, token=None)[source]¶
Create rate limit for tenant in Model Gateway.
- Parameters:
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
- delete(rate_limit_id)[source]¶
Delete rate limit from Model Gateway.
- Parameters:
rate_limit_id (str) – ID of the rate limit
- Returns:
status “SUCCESS” if deletion is successful
- Return type:
Literal[“SUCCESS”]
- Raises:
ApiRequestFailure if deletion failed
- get_details(*, rate_limit_id=None)[source]¶
Get details of rate limits. If
rate_limit_idis specified, returns details of that rate limit.- Parameters:
rate_limit_id (str, optional) – ID of the rate limit
- Returns:
details of rate limits or rate limit if
rate_limit_idis specified- Return type:
dict
- static get_id(rate_limit_details)[source]¶
Get rate limit ID from rate limit details.
- Parameters:
rate_limit_details (dict) – details of the rate limit
- Returns:
ID of the rate limit
- Return type:
str
- list()[source]¶
List rate limits registered in Model Gateway.
- Returns:
dataframe containing list results
- Return type:
pandas.DataFrame
- update_for_model(rate_limit_id, model_id, *, request=None, token=None)[source]¶
Update rate limit for model in Model Gateway.
- Parameters:
rate_limit_id (str) – ID of the rate limit
model_id (str) – ID of the Model Gateway model
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
- update_for_provider(rate_limit_id, provider_id, *, request=None, token=None)[source]¶
Update rate limit for provider in Model Gateway.
- Parameters:
rate_limit_id (str) – ID of the rate limit
provider_id (str) – ID of the Model Gateway provider
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
- update_for_tenant(rate_limit_id, *, request=None, token=None)[source]¶
Update rate limit for tenant in Model Gateway.
- Parameters:
rate_limit_id (str) – ID of the rate limit
request (RateLimitSettings, optional) – request rate limiting settings
token (RateLimitSettings, optional) – token rate limiting settings
- Returns:
rate limit details
- Return type:
dict
Get rate limit details for model requests¶
In order to get details of a request, which returned an error because of rate limits, you should use try-except to catch the APIRequestFailure exception.
The caught exception has the response property, which is the underlying httpx.Response instance.
Using that instance, you can retrieve the response headers, which contain information about the rate limit.
try:
response = gateway.completions.create(
model_id, "The default voltage provided in USB is "
)
except APIRequestFailure as exc:
error_response = exc.response
rate_limit_headers = {
name, value
for name, value in error_response.headers
if name.startswith("x-ratelimit-")
}