ModelInference¶

class ibm_watsonx_ai.foundation_models.inference.ModelInference(*, model_id=None, deployment_id=None, params=None, credentials=None, project_id=None, space_id=None, verify=None, api_client=None, validate=True, persistent_connection=True, max_retries=None, delay_time=None, retry_status_codes=None)[source]¶

Bases: BaseModelInference

Instantiate the model interface.

Hint

To use the ModelInference class with LangChain, use the WatsonxLLM wrapper.

Parameters:

model_id (str, optional) – type of model to use
deployment_id (str, optional) – ID of tuned model’s deployment
credentials (Credentials or dict, optional) – credentials for the Watson Machine Learning instance
params (dict, TextGenParameters, TextChatParameters, optional) – parameters to use during request generation
project_id (str, optional) – ID of the Watson Studio project
space_id (str, optional) – ID of the Watson Studio space
verify (bool or str, optional) –
You can pass one of the following as verify:
- the path to a CA_BUNDLE file
- the path of directory with certificates of trusted CAs
- True - default path to truststore will be taken
- False - no verification will be made
api_client (APIClient, optional) – initialized APIClient object with a set project ID or space ID. If passed, credentials and project_id/space_id are not required.
validate (bool, optional) – Model ID validation, defaults to True
persistent_connection (bool, optional) – Whether to keep persistent connection when evaluating generate, generate_text or tokenize methods. This parameter is only applicable for the mentioned methods when the prompt is a str type. To close the connection, run model.close_persistent_connection(), defaults to True. Added in 1.1.2.
max_retries (int, optional) – number of retries performed when request was not successful and status code is in retry_status_codes, defaults to 10
delay_time (float, optional) – delay time to retry request, factor in exponential backoff formula: wx_delay_time * pow(2.0, attempt), defaults to 0.5s
retry_status_codes (list[int], optional) – list of status codes which will be considered for retry mechanism, defaults to [429, 503, 504, 520]

Note

You must provide one of these parameters: [model_id, deployment_id]
When the credentials parameter is passed, you must provide one of these parameters: [project_id, space_id].

Hint

You can copy the project_id from the Project’s Manage tab (Project -> Manage -> General -> Details).

Example:

from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes, DecodingMethods

# To display example params enter
GenParams().get_example_values()

generate_params = {
    GenParams.MAX_NEW_TOKENS: 25
}

model_inference = ModelInference(
    model_id=ModelTypes.FLAN_UL2,
    params=generate_params,
    credentials=Credentials(
        api_key = IAM_API_KEY,
        url = "https://us-south.ml.cloud.ibm.com"),
    project_id="*****"
    )

from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai import Credentials

deployment_inference = ModelInference(
    deployment_id="<ID of deployed model>",
    credentials=Credentials(
        api_key = IAM_API_KEY,
        url = "https://us-south.ml.cloud.ibm.com"),
    project_id="*****"
    )

async achat(messages, params=None, tools=None, tool_choice=None, tool_choice_option=None, context=None)[source]¶

Given a list of messages comprising a conversation with a chat model in an asynchronous manner.

Parameters:

messages (list[dict]) – The messages for this chat session.
params (dict, TextChatParameters, optional) – meta props for chat generation, use ibm_watsonx_ai.foundation_models.schema.TextChatParameters.show()
tools (list) – Tool functions that can be called with the response.
tool_choice (dict, optional) – Specifying a particular tool via {“type”: “function”, “function”: {“name”: “my_function”}} forces the model to call that tool.
tool_choice_option (Literal["none", "auto"], optional) – Tool choice option
context (str, optional) – context variable can be present in chat system_prompt or chat messages content fields and are identified by sentence ‘{{ context }}’. Supported only with deployment_id, defaults to None.

Returns:

scoring result containing generated chat content.

Return type:

dict

Example:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
]
generated_response = await model.achat(messages=messages)

# Print all response
print(generated_response)

# Print only content
print(generated_response['choices'][0]['message']['content'])

async achat_stream(messages, params=None, tools=None, tool_choice=None, tool_choice_option=None, context=None)[source]¶

Given a list of messages comprising a conversation, the model will return a response in stream in an asynchronous manner.

Parameters:

messages (list[dict]) – The messages for this chat session.
params (dict, TextChatParameters, optional) – meta props for chat generation, use ibm_watsonx_ai.foundation_models.schema.TextChatParameters.show()
tools (list) – Tool functions that can be called with the response.
tool_choice (dict, optional) – Specifying a particular tool via {“type”: “function”, “function”: {“name”: “my_function”}} forces the model to call that tool.
tool_choice_option (Literal["none", "auto"], optional) – Tool choice option
context (str, optional) – context variable can be present in chat system_prompt or chat messages content fields and are identified by sentence ‘{{ context }}’. Supported only with deployment_id, defaults to None.

Returns:

scoring result containing generated chat content.

Return type:

AsyncGenerator

Example:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
]
generated_response = await model.achat_stream(messages=messages)

async for chunk in generated_response:
    if chunk['choices']:
        print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)

async aclose_persistent_connection()[source]¶: Only applicable if persistent_connection was set to True in the ModelInference initialization.

async agenerate(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, validate_prompt_variables=True)[source]¶

Generate a response in an asynchronous manner.

Parameters:

prompt (str | None, optional) – prompt string, defaults to None
params (dict | None, optional) – MetaProps for text generation, use ibm_watsonx_ai.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames, defaults to None
guardrails (bool, optional) – If True, the detection filter for potentially hateful, abusive, and/or profane language (HAP) is toggle on for both prompt and generated text, defaults to False If HAP is detected, then the HAPDetectionWarning is issued
guardrails_hap_params (dict | None, optional) – MetaProps for HAP moderations, use ibm_watsonx_ai.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames
validate_prompt_variables (bool, optional) – If True, the prompt variables provided in params are validated with the ones in the Prompt Template Asset. This parameter is only applicable in a Prompt Template Asset deployment scenario and should not be changed for different cases, defaults to True

Returns:

raw response that contains the generated content

Return type:

dict

async agenerate_stream(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, validate_prompt_variables=True)[source]¶

Generates a stream as agenerate_stream after getting a text prompt as input and parameters for the selected model (model_id). For prompt template deployment, prompt should be None.

Parameters:

params (dict, TextGenParameters, optional) – MetaProps for text generation, use ibm_watsonx_ai.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames
prompt (str, optional) – prompt string, defaults to None
guardrails (bool) – If True, the detection filter for potentially hateful, abusive, and/or profane language (HAP) is toggle on for both prompt and generated text, defaults to False If HAP is detected, then the HAPDetectionWarning is issued
guardrails_hap_params (dict) – MetaProps for HAP moderations, use ibm_watsonx_ai.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames
validate_prompt_variables (bool) – If True, the prompt variables provided in params are validated with the ones in the Prompt Template Asset. This parameter is only applicable in a Prompt Template Asset deployment scenario and should not be changed for different cases, defaults to True

Returns:

scoring result that contains the generated content

Return type:

AsyncGenerator

Note

By default, only the first occurrence of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:

import warnings
from ibm_watsonx_ai.foundation_models.utils import HAPDetectionWarning

warnings.filterwarnings("always", category=HAPDetectionWarning)

Example:

q = "Write an epigram about the sun"
generated_response = await model_inference.agenerate_stream(prompt=q)

async for chunk in generated_response:
    print(chunk, end='', flush=True)

chat(messages, params=None, tools=None, tool_choice=None, tool_choice_option=None, context=None)[source]¶

Given a list of messages comprising a conversation, the model will return a response.

Parameters:

messages (list[dict]) – The messages for this chat session.
params (dict, TextChatParameters, optional) – meta props for chat generation, use ibm_watsonx_ai.foundation_models.schema.TextChatParameters.show()
tools (list) – Tool functions that can be called with the response.
tool_choice (dict, optional) – Specifying a particular tool via {“type”: “function”, “function”: {“name”: “my_function”}} forces the model to call that tool.
tool_choice_option (Literal["none", "auto"], optional) – Tool choice option
context (str, optional) – context variable can be present in chat system_prompt or chat messages content fields and are identified by sentence ‘{{ context }}’. Supported only with deployment_id, defaults to None.

Returns:

scoring result containing generated chat content.

Return type:

dict

Example:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
]
generated_response = model.chat(messages=messages)

# Print all response
print(generated_response)

# Print only content
print(generated_response['choices'][0]['message']['content'])

chat_stream(messages, params=None, tools=None, tool_choice=None, tool_choice_option=None, context=None)[source]¶

Given a list of messages comprising a conversation, the model will return a response in stream.

Parameters:

messages (list[dict]) – The messages for this chat session.
params (dict, TextChatParameters, optional) – meta props for chat generation, use ibm_watsonx_ai.foundation_models.schema.TextChatParameters.show()
tools (list) – Tool functions that can be called with the response.
tool_choice (dict, optional) – Specifying a particular tool via {“type”: “function”, “function”: {“name”: “my_function”}} forces the model to call that tool.
tool_choice_option (Literal["none", "auto"], optional) – Tool choice option
context (str, optional) – context variable can be present in chat system_prompt or chat messages content fields and are identified by sentence ‘{{ context }}’. Supported only with deployment_id, defaults to None.

Returns:

scoring result containing generated chat content.

Return type:

generator

Example:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
]
generated_response = model.chat_stream(messages=messages)

for chunk in generated_response:
    if chunk['choices']:
        print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)

close_persistent_connection()[source]¶: Only applicable if persistent_connection was set to True in ModelInference initialization. Calling this method closes the current httpx.Client and recreates a new httpx.Client with default values: timeout: httpx.Timeout(timeout=30 * 60, connect=10) limit: httpx.Limits(max_connections=10, max_keepalive_connections=10, keepalive_expiry=HTTPX_KEEPALIVE_EXPIRY)

generate(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, concurrency_limit=8, async_mode=False, validate_prompt_variables=True)[source]¶

Generates a completion text as generated_text after getting a text prompt as input and parameters for the selected model (model_id) or deployment (deployment_id). For prompt template deployment, prompt should be None.

Parameters:

params (dict, TextGenParameters, optional) – MetaProps for text generation, use ibm_watsonx_ai.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames
concurrency_limit (int) – number of requests to be sent in parallel, max is 10
prompt ((str | list | None), optional) – prompt string or list of strings. If list of strings is passed, requests will be managed in parallel with the rate of concurency_limit, defaults to None
guardrails (bool) – If True, the detection filter for potentially hateful, abusive, and/or profane language (HAP) is toggle on for both prompt and generated text, defaults to False
guardrails_hap_params (dict) – MetaProps for HAP moderations, use ibm_watsonx_ai.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames
async_mode (bool) – If True, yields results asynchronously (using a generator). In this case, both prompt and generated text will be concatenated in the final response - under generated_text, defaults to False
validate_prompt_variables (bool, optional) – If True and ModelInference instance has been initialized with validate=True, prompt variables provided in params are validated with the ones in the Prompt Template Asset. This parameter is only applicable in a Prompt Template Asset deployment scenario and should not be changed for different cases, defaults to True

Returns:

scoring result the contains the generated content

Return type:

dict

Example:

q = "What is 1 + 1?"
generated_response = model_inference.generate(prompt=q)
print(generated_response['results'][0]['generated_text'])

generate_text(prompt=None, params=None, raw_response=False, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, concurrency_limit=8, validate_prompt_variables=True)[source]¶

Generates a completion text as generated_text after getting a text prompt as input and parameters for the selected model (model_id). For prompt template deployment, prompt should be None.

Parameters:

params (dict, TextGenParameters, optional) – MetaProps for text generation, use ibm_watsonx_ai.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames
concurrency_limit (int) – number of requests to be sent in parallel, max is 10
prompt ((str | list | None), optional) – prompt string or list of strings. If list of strings is passed, requests will be managed in parallel with the rate of concurency_limit, defaults to None
guardrails (bool) – If True, the detection filter for potentially hateful, abusive, and/or profane language (HAP) is toggle on for both prompt and generated text, defaults to False If HAP is detected, then the HAPDetectionWarning is issued
guardrails_hap_params (dict) – MetaProps for HAP moderations, use ibm_watsonx_ai.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames
raw_response (bool, optional) – returns the whole response object
validate_prompt_variables (bool) – If True, the prompt variables provided in params are validated with the ones in the Prompt Template Asset. This parameter is only applicable in a Prompt Template Asset deployment scenario and should not be changed for different cases, defaults to True

Returns:

generated content

Return type:

str | list | dict

Note

By default, only the first occurrence of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:

import warnings
from ibm_watsonx_ai.foundation_models.utils import HAPDetectionWarning

warnings.filterwarnings("always", category=HAPDetectionWarning)

Example:

q = "What is 1 + 1?"
generated_text = model_inference.generate_text(prompt=q)
print(generated_text)

generate_text_stream(prompt=None, params=None, raw_response=False, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, validate_prompt_variables=True)[source]¶

Generates a streamed text as generate_text_stream after getting a text prompt as input and parameters for the selected model (model_id). For prompt template deployment, prompt should be None.

Parameters:

params (dict, TextGenParameters, optional) – MetaProps for text generation, use ibm_watsonx_ai.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames
prompt (str, optional) – prompt string, defaults to None
raw_response (bool, optional) – yields the whole response object
guardrails (bool) – If True, the detection filter for potentially hateful, abusive, and/or profane language (HAP) is toggle on for both prompt and generated text, defaults to False If HAP is detected, then the HAPDetectionWarning is issued
guardrails_hap_params (dict) – MetaProps for HAP moderations, use ibm_watsonx_ai.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames
validate_prompt_variables (bool) – If True, the prompt variables provided in params are validated with the ones in the Prompt Template Asset. This parameter is only applicable in a Prompt Template Asset deployment scenario and should not be changed for different cases, defaults to True

Returns:

scoring result that contains the generated content

Return type:

generator

Note

By default, only the first occurrence of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:

import warnings
from ibm_watsonx_ai.foundation_models.utils import HAPDetectionWarning

warnings.filterwarnings("always", category=HAPDetectionWarning)

Example:

q = "Write an epigram about the sun"
generated_response = model_inference.generate_text_stream(prompt=q)

for chunk in generated_response:
    print(chunk, end='', flush=True)

get_details()[source]¶

Get the details of a model interface

Returns:: details of the model or deployment
Return type:: dict

Example:

model_inference.get_details()

get_identifying_params()[source]¶: Represent Model Inference’s setup in dictionary

set_api_client(api_client)[source]¶

Set or refresh the APIClient object associated with ModelInference object.

Parameters:: api_client (APIClient, optional) – initialized APIClient object with a set project ID or space ID.

Example:

api_client = APIClient(credentials=..., space_id=...)
model_inference.set_api_client(api_client=api_client)

to_langchain()[source]¶

Returns:: WatsonxLLM wrapper for watsonx foundation models
Return type:: WatsonxLLM

Example:

from langchain import PromptTemplate
from langchain.chains import LLMChain
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

flan_ul2_model = ModelInference(
    model_id=ModelTypes.FLAN_UL2,
    credentials=Credentials(
                        api_key = IAM_API_KEY,
                        url = "https://us-south.ml.cloud.ibm.com"),
    project_id="*****"
    )

prompt_template = "What color is the {flower}?"

llm_chain = LLMChain(llm=flan_ul2_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template))
llm_chain.invoke('sunflower')

from langchain import PromptTemplate
from langchain.chains import LLMChain
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

deployed_model = ModelInference(
    deployment_id="<ID of deployed model>",
    credentials=Credentials(
                        api_key = IAM_API_KEY,
                        url = "https://us-south.ml.cloud.ibm.com"),
    space_id="*****"
    )

prompt_template = "What color is the {car}?"

llm_chain = LLMChain(llm=deployed_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template))
llm_chain.invoke('sunflower')

tokenize(prompt, return_tokens=False)[source]¶

The text tokenize operation allows you to check the conversion of provided input to tokens for a given model. It splits text into words or sub-words, which then are converted to IDs through a look-up table (vocabulary). Tokenization allows the model to have a reasonable vocabulary size.

Note

The tokenization method is available only for base models and is not supported for deployments.

Parameters:

prompt (str, optional) – prompt string, defaults to None
return_tokens (bool) – parameter for text tokenization, defaults to False

Returns:

result of tokenizing the input string

Return type:

dict

Example:

q = "Write an epigram about the moon"
tokenized_response = model_inference.tokenize(prompt=q, return_tokens=True)
print(tokenized_response["result"])

Enums¶

class TextModels¶

Bases: StrEnum

This represents a dynamically generated Enum for Foundation Models.

Example of getting TextModels:

# GET TextModels ENUM
client.foundation_models.TextModels

# PRINT dict of Enums
client.foundation_models.TextModels.show()

Example Output:

{'GRANITE_13B_CHAT_V2': 'ibm/granite-13b-chat-v2',
'GRANITE_13B_INSTRUCT_V2': 'ibm/granite-13b-instruct-v2',
...
'LLAMA_2_13B_CHAT': 'meta-llama/llama-2-13b-chat',
'LLAMA_2_70B_CHAT': 'meta-llama/llama-2-70b-chat',
'LLAMA_3_70B_INSTRUCT': 'meta-llama/llama-3-70b-instruct',
'MIXTRAL_8X7B_INSTRUCT_V01': 'mistralai/mixtral-8x7b-instruct-v01'}

Example of initialising ModelInference with TextModels Enum:

from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=client.foundation_models.TextModels.GRANITE_13B_INSTRUCT_V2,
    credentials=Credentials(...),
    project_id=project_id,
)

class ChatModels¶

Bases: StrEnum

This represents a dynamically generated Enum for Chat Models.

Example of getting ChatModels:

# GET ChatModels ENUM
client.foundation_models.ChatModels

# PRINT dict of Enums
client.foundation_models.ChatModels.show()

Example Output:

{'GRANITE_20B_CODE_INSTRUCT': 'ibm/granite-20b-code-instruct',
'GRANITE_3_8B_INSTRUCT': 'ibm/granite-3-8b-instruct',
...
'LLAMA_3_3_70B_INSTRUCT': 'meta-llama/llama-3-3-70b-instruct',
'LLAMA_3_405B_INSTRUCT': 'meta-llama/llama-3-405b-instruct',
'MISTRAL_LARGE': 'mistralai/mistral-large',
'MIXTRAL_8X7B_INSTRUCT_V01': 'mistralai/mixtral-8x7b-instruct-v01'}

Example of initialising ModelInference with ChatModels Enum:

from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=client.foundation_models.ChatModels.GRANITE_3_8B_INSTRUCT,
    credentials=Credentials(...),
    project_id=project_id,
)