ModelInference#

class ibm_watson_machine_learning.foundation_models.inference.ModelInference(*, model_id=None, deployment_id=None, params=None, credentials=None, project_id=None, space_id=None, verify=None, api_client=None)[source]#

Bases: BaseModelInference

Instantiate the model interface.

Hint

To use the ModelInference class with LangChain, use the WatsonxLLM wrapper.

Parameters:
  • model_id (str, optional) – the type of model to use

  • deployment_id (str, optional) – ID of tuned model’s deployment

  • credentials (dict, optional) – credentials to Watson Machine Learning instance

  • params (dict, optional) – parameters to use during generate requests

  • project_id (str, optional) – ID of the Watson Studio project

  • space_id (str, optional) – ID of the Watson Studio space

  • verify (bool or str, optional) –

    user can pass as verify one of following:

    • the path to a CA_BUNDLE file

    • the path of directory with certificates of trusted CAs

    • True - default path to truststore will be taken

    • False - no verification will be made

  • api_client (APIClient, optional) – Initialized APIClient object with set project or space ID. If passed, credentials and project_id/space_id are not required.

Note

One of these parameters is required: [model_id, deployment_id]

Note

One of these parameters is required: [project_id, space_id] when credentials parameter passed.

Hint

You can copy the project_id from Project’s Manage tab (Project -> Manage -> General -> Details).

Example

from ibm_watson_machine_learning.foundation_models import ModelInference
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes, DecodingMethods

# To display example params enter
GenParams().get_example_values()

generate_params = {
    GenParams.MAX_NEW_TOKENS: 25
}

model_inference = ModelInference(
    model_id=ModelTypes.FLAN_UL2,
    params=generate_params,
    credentials={
        "apikey": "***",
        "url": "https://us-south.ml.cloud.ibm.com"
    },
    project_id="*****"
    )
from ibm_watson_machine_learning.foundation_models import ModelInference

deployment_inference = ModelInference(
    deployment_id="<ID of deployed model>",
    credentials={
        "apikey": "***",
        "url": "https://us-south.ml.cloud.ibm.com"
    },
    project_id="*****"
    )
generate(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, concurrency_limit=10, async_mode=False)[source]#

Given a text prompt as input, and parameters the selected model (model_id) or deployment (deployment_id) will generate a completion text as generated_text. For prompt template deployment prompt should be None.

Parameters:
  • params (dict) – meta props for text generation, use ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames

  • concurrency_limit (int) – number of requests that will be sent in parallel, max is 10

  • prompt ((str | list | None), optional) – the prompt string or list of strings. If list of strings is passed requests will be managed in parallel with the rate of concurency_limit, defaults to None

  • guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False

  • guardrails_hap_params (dict) – meta props for HAP moderations, use ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames

  • async_mode (bool) – If True then yield results asynchronously (using generator). In this case both prompt and generated text will be concatenated in the final response - under generated_text, defaults to False

Returns:

scoring result containing generated content

Return type:

dict

Example

q = "What is 1 + 1?"
generated_response = model_inference.generate(prompt=q)
print(generated_response['results'][0]['generated_text'])
generate_text(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, raw_response=False, concurrency_limit=10)[source]#

Given a text prompt as input, and parameters the selected model (model_id) will generate a completion text as generated_text. For prompt template deployment prompt should be None.

Parameters:
  • params (dict) – meta props for text generation, use ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames

  • concurrency_limit (int) – number of requests that will be sent in parallel, max is 10

  • prompt ((str | list | None), optional) – the prompt string or list of strings. If list of strings is passed requests will be managed in parallel with the rate of concurency_limit, defaults to None

  • guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False If HAP is detected the HAPDetectionWarning is issued

  • guardrails_hap_params (dict) – meta props for HAP moderations, use ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames

  • raw_response (bool, optional) – return the whole response object

Returns:

generated content

Return type:

str

Note

By default only the first occurance of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:

import warnings
from ibm_watson_machine_learning.foundation_models.utils import HAPDetectionWarning

warnings.filterwarnings("always", category=HAPDetectionWarning)

Example

q = "What is 1 + 1?"
generated_text = model_inference.generate_text(prompt=q)
print(generated_text)
generate_text_stream(prompt=None, params=None, raw_response=False, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None)[source]#

Given a text prompt as input, and parameters the selected model (model_id) will generate a streamed text as generate_text_stream. For prompt template deployment prompt should be None.

Parameters:
  • params (dict) – meta props for text generation, use ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show() to view the list of MetaNames

  • prompt (str, optional) – the prompt string, defaults to None

  • raw_response (bool, optional) – yields the whole response object

  • guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False If HAP is detected the HAPDetectionWarning is issued

  • guardrails_hap_params (dict) – meta props for HAP moderations, use ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show() to view the list of MetaNames

Returns:

scoring result containing generated content

Return type:

generator

Note

By default only the first occurance of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:

import warnings
from ibm_watson_machine_learning.foundation_models.utils import HAPDetectionWarning

warnings.filterwarnings("always", category=HAPDetectionWarning)

Example

q = "Write an epigram about the sun"
generated_response = model_inference.generate_text_stream(prompt=q)

for chunk in generated_response:
    print(chunk, end='')
get_details()[source]#

Get model interface’s details

Returns:

details of model or deployment

Return type:

dict

Example

model_inference.get_details()
get_identifying_params()[source]#

Represent Model Inference’s setup in dictionary

to_langchain()[source]#
Returns:

WatsonxLLM wrapper for watsonx foundation models

Return type:

WatsonxLLM

Example

from langchain import PromptTemplate
from langchain.chains import LLMChain
from ibm_watson_machine_learning.foundation_models import ModelInference
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

flan_ul2_model = ModelInference(
    model_id=ModelTypes.FLAN_UL2,
    credentials={
        "apikey": "***",
        "url": "https://us-south.ml.cloud.ibm.com"
    },
    project_id="*****"
    )

prompt_template = "What color is the {flower}?"

llm_chain = LLMChain(llm=flan_ul2_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template))
llm_chain('sunflower')
from langchain import PromptTemplate
from langchain.chains import LLMChain
from ibm_watson_machine_learning.foundation_models import ModelInference
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

deployed_model = ModelInference(
    deployment_id="<ID of deployed model>",
    credentials={
        "apikey": "***",
        "url": "https://us-south.ml.cloud.ibm.com"
    },
    space_id="*****"
    )

prompt_template = "What color is the {car}?"

llm_chain = LLMChain(llm=deployed_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template))
llm_chain('sunflower')
tokenize(prompt=None, return_tokens=False)[source]#

The text tokenize operation allows you to check the conversion of provided input to tokens for a given model. It splits text into words or sub-words, which then are converted to ids through a look-up table (vocabulary). Tokenization allows the model to have a reasonable vocabulary size.

Note

Method is not supported for deployments, available only for base models.

Parameters:
  • prompt (str, optional) – the prompt string, defaults to None

  • return_tokens (bool) – the parameter for text tokenization, defaults to False

Returns:

the result of tokenizing the input string.

Return type:

dict

Example

q = "Write an epigram about the moon"
tokenized_response = model_inference.tokenize(prompt=q, return_tokens=True)
print(tokenized_response["result"])