ModelInference#
- class ibm_watson_machine_learning.foundation_models.inference.ModelInference(*, model_id=None, deployment_id=None, params=None, credentials=None, project_id=None, space_id=None, verify=None, api_client=None)[source]#
Bases:
BaseModelInference
Instantiate the model interface.
Hint
To use the ModelInference class with LangChain, use the
WatsonxLLM
wrapper.- Parameters:
model_id (str, optional) – the type of model to use
deployment_id (str, optional) – ID of tuned model’s deployment
credentials (dict, optional) – credentials to Watson Machine Learning instance
params (dict, optional) – parameters to use during generate requests
project_id (str, optional) – ID of the Watson Studio project
space_id (str, optional) – ID of the Watson Studio space
verify (bool or str, optional) –
user can pass as verify one of following:
the path to a CA_BUNDLE file
the path of directory with certificates of trusted CAs
True - default path to truststore will be taken
False - no verification will be made
api_client (APIClient, optional) – Initialized APIClient object with set project or space ID. If passed,
credentials
andproject_id
/space_id
are not required.
Note
One of these parameters is required: [
model_id
,deployment_id
]Note
One of these parameters is required: [
project_id
,space_id
] whencredentials
parameter passed.Hint
You can copy the project_id from Project’s Manage tab (Project -> Manage -> General -> Details).
Example
from ibm_watson_machine_learning.foundation_models import ModelInference from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes, DecodingMethods # To display example params enter GenParams().get_example_values() generate_params = { GenParams.MAX_NEW_TOKENS: 25 } model_inference = ModelInference( model_id=ModelTypes.FLAN_UL2, params=generate_params, credentials={ "apikey": "***", "url": "https://us-south.ml.cloud.ibm.com" }, project_id="*****" )
from ibm_watson_machine_learning.foundation_models import ModelInference deployment_inference = ModelInference( deployment_id="<ID of deployed model>", credentials={ "apikey": "***", "url": "https://us-south.ml.cloud.ibm.com" }, project_id="*****" )
- generate(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, concurrency_limit=10, async_mode=False)[source]#
Given a text prompt as input, and parameters the selected model (model_id) or deployment (deployment_id) will generate a completion text as generated_text. For prompt template deployment prompt should be None.
- Parameters:
params (dict) – meta props for text generation, use
ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show()
to view the list of MetaNamesconcurrency_limit (int) – number of requests that will be sent in parallel, max is 10
prompt ((str | list | None), optional) – the prompt string or list of strings. If list of strings is passed requests will be managed in parallel with the rate of concurency_limit, defaults to None
guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False
guardrails_hap_params (dict) – meta props for HAP moderations, use
ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show()
to view the list of MetaNamesasync_mode (bool) – If True then yield results asynchronously (using generator). In this case both prompt and generated text will be concatenated in the final response - under generated_text, defaults to False
- Returns:
scoring result containing generated content
- Return type:
dict
Example
q = "What is 1 + 1?" generated_response = model_inference.generate(prompt=q) print(generated_response['results'][0]['generated_text'])
- generate_text(prompt=None, params=None, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None, raw_response=False, concurrency_limit=10)[source]#
Given a text prompt as input, and parameters the selected model (model_id) will generate a completion text as generated_text. For prompt template deployment prompt should be None.
- Parameters:
params (dict) – meta props for text generation, use
ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show()
to view the list of MetaNamesconcurrency_limit (int) – number of requests that will be sent in parallel, max is 10
prompt ((str | list | None), optional) – the prompt string or list of strings. If list of strings is passed requests will be managed in parallel with the rate of concurency_limit, defaults to None
guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False If HAP is detected the HAPDetectionWarning is issued
guardrails_hap_params (dict) – meta props for HAP moderations, use
ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show()
to view the list of MetaNamesraw_response (bool, optional) – return the whole response object
- Returns:
generated content
- Return type:
str
Note
By default only the first occurance of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:
import warnings from ibm_watson_machine_learning.foundation_models.utils import HAPDetectionWarning warnings.filterwarnings("always", category=HAPDetectionWarning)
Example
q = "What is 1 + 1?" generated_text = model_inference.generate_text(prompt=q) print(generated_text)
- generate_text_stream(prompt=None, params=None, raw_response=False, guardrails=False, guardrails_hap_params=None, guardrails_pii_params=None)[source]#
Given a text prompt as input, and parameters the selected model (model_id) will generate a streamed text as generate_text_stream. For prompt template deployment prompt should be None.
- Parameters:
params (dict) – meta props for text generation, use
ibm_watson_machine_learning.metanames.GenTextParamsMetaNames().show()
to view the list of MetaNamesprompt (str, optional) – the prompt string, defaults to None
raw_response (bool, optional) – yields the whole response object
guardrails (bool) – If True then potentially hateful, abusive, and/or profane language (HAP) detection filter is toggle on for both prompt and generated text, defaults to False If HAP is detected the HAPDetectionWarning is issued
guardrails_hap_params (dict) – meta props for HAP moderations, use
ibm_watson_machine_learning.metanames.GenTextModerationsMetaNames().show()
to view the list of MetaNames
- Returns:
scoring result containing generated content
- Return type:
generator
Note
By default only the first occurance of HAPDetectionWarning is displayed. To enable printing all warnings of this category, use:
import warnings from ibm_watson_machine_learning.foundation_models.utils import HAPDetectionWarning warnings.filterwarnings("always", category=HAPDetectionWarning)
Example
q = "Write an epigram about the sun" generated_response = model_inference.generate_text_stream(prompt=q) for chunk in generated_response: print(chunk, end='')
- get_details()[source]#
Get model interface’s details
- Returns:
details of model or deployment
- Return type:
dict
Example
model_inference.get_details()
- to_langchain()[source]#
- Returns:
WatsonxLLM wrapper for watsonx foundation models
- Return type:
Example
from langchain import PromptTemplate from langchain.chains import LLMChain from ibm_watson_machine_learning.foundation_models import ModelInference from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes flan_ul2_model = ModelInference( model_id=ModelTypes.FLAN_UL2, credentials={ "apikey": "***", "url": "https://us-south.ml.cloud.ibm.com" }, project_id="*****" ) prompt_template = "What color is the {flower}?" llm_chain = LLMChain(llm=flan_ul2_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template)) llm_chain('sunflower')
from langchain import PromptTemplate from langchain.chains import LLMChain from ibm_watson_machine_learning.foundation_models import ModelInference from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes deployed_model = ModelInference( deployment_id="<ID of deployed model>", credentials={ "apikey": "***", "url": "https://us-south.ml.cloud.ibm.com" }, space_id="*****" ) prompt_template = "What color is the {car}?" llm_chain = LLMChain(llm=deployed_model.to_langchain(), prompt=PromptTemplate.from_template(prompt_template)) llm_chain('sunflower')
- tokenize(prompt=None, return_tokens=False)[source]#
The text tokenize operation allows you to check the conversion of provided input to tokens for a given model. It splits text into words or sub-words, which then are converted to ids through a look-up table (vocabulary). Tokenization allows the model to have a reasonable vocabulary size.
Note
Method is not supported for deployments, available only for base models.
- Parameters:
prompt (str, optional) – the prompt string, defaults to None
return_tokens (bool) – the parameter for text tokenization, defaults to False
- Returns:
the result of tokenizing the input string.
- Return type:
dict
Example
q = "Write an epigram about the moon" tokenized_response = model_inference.tokenize(prompt=q, return_tokens=True) print(tokenized_response["result"])