ModelInference for Deployments

This section shows how to use the ModelInference module with a created deployment.

You can infer text or chat with deployed model in one of two ways:

Infer text with deployments

You can directly query generate_text or generate_text_stream using the deployments module.

client.deployments.generate_text(
    prompt="Example prompt",
    deployment_id=deployment_id)

# OR

for chunk in client.deployments.generate_text_stream(deployment_id=deployment_id, prompt=input_prompt):
    print(chunk, end="", flush=True)

Chat with deployments

You can directly call chat or chat_stream using the deployments module.

messages = [
    {
        "role": "user",
        "content": "How are you?"
    }
]
chat_response = client.deployments.chat(deployment_id=deployment_id, messages=messages)
print(chat_response["choices"][0]["message"]["content"])

# OR

for chunk in client.deployments.chat_stream(deployment_id=deployment_id, messages=messages):
    if chunk["choices"]:
        print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)

Creating ModelInference instance

Start by defining the parameters. They will later be used by the module.

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

generate_params = {
    GenParams.MAX_NEW_TOKENS: 25,
    GenParams.STOP_SEQUENCES: ["\n"]
}

Create the ModelInference by using credentials and project_id / space_id, or the previously initialized APIClient (see APIClient initialization).

from ibm_watsonx_ai.foundation_models import ModelInference

deployed_model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    credentials=credentials,
    project_id=project_id
)

# OR

deployed_model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    api_client=client
)

A detailed explanation of available methods with exact parameters can be found in the ModelInferece class.

You can directly query generate_text using the ModelInference object.

deployed_model.generate_text(prompt="Example prompt")

Also, you can chat using the same instance.

messages = [
    {
        "role": "user",
        "content": "How are you?"
    }
]
deployed_model.chat(messages=messages)

Generate methods

With the previously created deployed_model object, it is possible to generate a text stream (generator) using a defined inference and the generate_text_stream() method.

for chunk in deployed_model.generate_text_stream(prompt=input_prompt):
    print(chunk, end="", flush=True)
'$10 Powerchill Leggings'

And also receive more detailed result with generate().

details = deployed_model.generate(prompt=input_prompt, params=gen_params)
print(details)
{
    'model_id': 'google/flan-t5-xl',
    'created_at': '2023-11-17T15:32:57.401Z',
    'results': [
        {
        'generated_text': '$10 Powerchill Leggings',
        'generated_token_count': 8,
        'input_token_count': 73,
        'stop_reason': 'eos_token'
        }
    ],
    'system': {'warnings': []}
}

Chat methods

You can chat with the previously created deployed_model object using chat() method.

messages = [
    {
        "role": "user",
        "content": "How are you?"
    }
]
chat_response = deployed_model.chat(messages=messages)
print(chat_response["choices"][0]["message"]["content"])

It is also possible to chat in streaming mode (generator) using a defined instance and the chat_stream() method.

messages = [
    {
        "role": "user",
        "content": "How are you?"
    }
]
chat_stream_response = deployed_model.chat_stream(messages=messages)

for chunk in chat_stream_response:
    if chunk["choices"]:
        print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)