ModelInference for Deployments¶
This section shows how to use the ModelInference module with a created deployment.
- You can infer text or chat with deployed model in one of two ways:
the deployments module
the ModelInference module
Infer text with deployments¶
You can directly query generate_text or generate_text_stream using the deployments module.
client.deployments.generate_text(prompt="Example prompt", deployment_id=deployment_id)
# OR
for chunk in client.deployments.generate_text_stream(
deployment_id=deployment_id, prompt=input_prompt
):
print(chunk, end="", flush=True)
Chat with deployments¶
You can directly call chat or chat_stream using the deployments module.
messages = [{"role": "user", "content": "How are you?"}]
chat_response = client.deployments.chat(deployment_id=deployment_id, messages=messages)
print(chat_response["choices"][0]["message"]["content"])
# OR
for chunk in client.deployments.chat_stream(
deployment_id=deployment_id, messages=messages
):
if chunk["choices"]:
print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)
Creating ModelInference instance¶
Start by defining the parameters. They will later be used by the module.
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
generate_params = {GenParams.MAX_NEW_TOKENS: 25, GenParams.STOP_SEQUENCES: ["\n"]}
Create the ModelInference by using credentials and project_id / space_id, or the previously initialized
APIClient (see APIClient initialization).
from ibm_watsonx_ai.foundation_models import ModelInference
deployed_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
credentials=credentials,
project_id=project_id,
)
# OR
deployed_model = ModelInference(
deployment_id=deployment_id, params=generate_params, api_client=client
)
A detailed explanation of available methods with exact parameters can be found in the ModelInferece class.
You can directly query generate_text using the ModelInference object.
deployed_model.generate_text(prompt="Example prompt")
Also, you can chat using the same instance.
messages = [{"role": "user", "content": "How are you?"}]
deployed_model.chat(messages=messages)
Generate methods¶
With the previously created deployed_model object, it is possible to generate a text stream (generator) using a
defined inference and the generate_text_stream() method.
for chunk in deployed_model.generate_text_stream(prompt=input_prompt):
print(chunk, end="", flush=True)
"$10 Powerchill Leggings"
And also receive more detailed result with generate().
details = deployed_model.generate(prompt=input_prompt, params=gen_params)
print(details)
{
"model_id": "google/flan-t5-xl",
"created_at": "2023-11-17T15:32:57.401Z",
"results": [
{
"generated_text": "$10 Powerchill Leggings",
"generated_token_count": 8,
"input_token_count": 73,
"stop_reason": "eos_token",
}
],
"system": {"warnings": []},
}
Chat methods¶
You can chat with the previously created deployed_model object using chat() method.
messages = [{"role": "user", "content": "How are you?"}]
chat_response = deployed_model.chat(messages=messages)
print(chat_response["choices"][0]["message"]["content"])
It is also possible to chat in streaming mode (generator) using a defined instance and the chat_stream() method.
messages = [{"role": "user", "content": "How are you?"}]
chat_stream_response = deployed_model.chat_stream(messages=messages)
for chunk in chat_stream_response:
if chunk["choices"]:
print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)