ModelInference
for Deployments¶
This section shows how to use the ModelInference module with a created deployment.
- You can infer text in one of two ways:
the deployments module
the ModelInference module
Infer text with deployments¶
You can directly query generate_text
using the deployments module.
client.deployments.generate_text(
prompt="Example prompt",
deployment_id=deployment_id)
Creating ModelInference
instance¶
Start by defining the parameters. They will later be used by the module.
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
generate_params = {
GenParams.MAX_NEW_TOKENS: 25,
GenParams.STOP_SEQUENCES: ["\n"]
}
Create the ModelInference by using credentials and project_id
/ space_id
, or the previously initialized APIClient (see APIClient initialization).
from ibm_watsonx_ai.foundation_models import ModelInference
deployed_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
credentials=credentials,
project_id=project_id
)
# OR
deployed_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
api_client=client
)
You can directly query generate_text
using the ModelInference
object.
deployed_model.generate_text(prompt="Example prompt")
Generate methods¶
A detailed explanation of available generate methods with exact parameters can be found in the ModelInferece class.
With the previously created deployed_model
object, it is possible to generate a text stream (generator) using a defined inference and the generate_text_stream()
method.
for token in deployed_model.generate_text_stream(prompt=input_prompt):
print(token, end="")
'$10 Powerchill Leggings'
And also receive more detailed result with generate()
.
details = deployed_model.generate(prompt=input_prompt, params=gen_params)
print(details)
{
'model_id': 'google/flan-t5-xl',
'created_at': '2023-11-17T15:32:57.401Z',
'results': [
{
'generated_text': '$10 Powerchill Leggings',
'generated_token_count': 8,
'input_token_count': 73,
'stop_reason': 'eos_token'
}
],
'system': {'warnings': []}
}