Tuned Model Inference¶
This section shows how to deploy a model and use the ModelInference
class with the created deployment.
- You can query
generate_text
in one of two ways: the deployments module
the ModelInference module
Working with deployments¶
This section describes methods for working with deployments. Before you begin, create a client and set the project_id
or space_id
.
from ibm_watsonx_ai import APIClient
client = APIClient(credentials)
client.set.default_project("7ac03029-8bdd-4d5f-a561-2c4fd1e40705")
To create a deployment with specific parameters, call the following lines:
from datetime import datetime
model_id = prompt_tuner.get_model_id()
meta_props = {
client.deployments.ConfigurationMetaNames.NAME: "PT DEPLOYMENT SDK - project",
client.deployments.ConfigurationMetaNames.ONLINE: {},
client.deployments.ConfigurationMetaNames.SERVING_NAME : f"pt_sdk_deployment_{datetime.now().strftime('%Y_%m_%d_%H%M%S')}"
}
deployment_details = client.deployments.create(model_id, meta_props)
from datetime import datetime
model_id = fine_tuner.get_model_id()
hw_spec_id = client.hardware_specifications.get_id_by_name("WX-S")
meta_props = {
client.deployments.ConfigurationMetaNames.NAME: "FT DEPLOYMENT SDK - project",
client.deployments.ConfigurationMetaNames.ONLINE: {},
client.deployments.ConfigurationMetaNames.SERVING_NAME : f"ft_sdk_deployment_{datetime.now().strftime('%Y_%m_%d_%H%M%S')}",
client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
"id": hw_spec_id,
"num_nodes": 1
}
}
deployment_details = client.deployments.create(model_id, meta_props)
To get a deployment_id from details, use the id
from metadata
.
deployment_id = deployment_details['metadata']['id']
print(deployment_id)
'7091629c-f88a-4e90-b7f0-4f414aec9c3a'
You can directly query generate_text
using the deployments module.
client.deployments.generate_text(
prompt="Example prompt",
deployment_id=deployment_id)
Creating ModelInference
instance¶
Start by defining the parameters. They will later be used by the module.
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
generate_params = {
GenParams.MAX_NEW_TOKENS: 25,
GenParams.STOP_SEQUENCES: ["\n"]
}
Create the ModelInference by using credentials and project_id
/ space_id
, or the previously initialized APIClient (see APIClient initialization).
from ibm_watsonx_ai.foundation_models import ModelInference
tuned_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
credentials=credentials,
project_id=project_id
)
# OR
tuned_model = ModelInference(
deployment_id=deployment_id,
params=generate_params,
api_client=client
)
You can directly query generate_text
using the ModelInference
object.
tuned_model.generate_text(prompt="Example prompt")
Importing data¶
To use ModelInference, an example data might be needed.
import pandas as pd
filename = 'car_rental_prompt_tuning_testing_data.json'
url = "https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/prompt_tuning/car_rental_prompt_tuning_testing_data.json"
if not os.path.isfile(filename):
wget.download(url)
data = pd.read_json(filename)
Analyzing satisfaction¶
Note
The satisfaction analysis was performed for a specific example, car rental, and it may not work in the case of other data sets.
To analyze the satisfaction, prepare batch with prompts, calculate the accuracy of the tuned model, and compare it with the base model.
prompts = list(data.input)
satisfaction = list(data.output)
prompts_batch = ["\n".join([prompt]) for prompt in prompts]
Calculate the accuracy of the based model:
from sklearn.metrics import accuracy_score, f1_score
base_model = ModelInference(
model_id='google/flan-t5-xl',
params=generate_params,
api_client=client
)
base_model_results = base_model.generate_text(prompt=prompts_batch)
print(f'base model accuracy_score: {accuracy_score(satisfaction, [int(x) for x in base_model_results])}, base model f1_score: {f1_score(satisfaction, [int(x) for x in base_model_results])}')
'base model accuracy_score: 0.965034965034965, base model f1_score: 0.9765258215962441'
Calculate the accuracy of the tuned model:
tuned_model_results = tuned_model.generate_text(prompt=prompts_batch)
print(f'accuracy_score: {accuracy_score(satisfaction, [int(x) for x in tuned_model_results])}, f1_score: {f1_score(satisfaction, [int(x) for x in tuned_model_results])}')
'accuracy_score: 0.972027972027972, f1_score: 0.9811320754716981'
Generate methods¶
A detailed explanation of available generate methods with exact parameters can be found in the ModelInferece class.
With the previously created tuned_model
object, it is possible to generate a text stream (generator) using defined inference and generate_text_stream()
method.
for token in tuned_model.generate_text_stream(prompt=input_prompt):
print(token, end="")
'$10 Powerchill Leggings'
Get a more detailed result with generate()
.
details = tuned_model.generate(prompt=input_prompt, params=gen_params)
print(details)
{
'model_id': 'google/flan-t5-xl',
'created_at': '2023-11-17T15:32:57.401Z',
'results': [
{
'generated_text': '$10 Powerchill Leggings',
'generated_token_count': 8,
'input_token_count': 73,
'stop_reason': 'eos_token'
}
],
'system': {'warnings': []}
}