Tuned Model Inference#

This section shows how to deploy model and use ModelInference class with created deployment.

There are two ways to query generate_text using the deployments module or using ModelInference module .

Working with deployments#

This section describes methods that enable user to work with deployments. But first it will be needed to create client and set project_id or space_id.

from ibm_watson_machine_learning import APIClient

client = APIClient(credentials)

To create deployment with specific parameters call following lines.

from datetime import datetime

model_id = prompt_tuner.get_model_id()

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "PT DEPLOYMENT SDK - project",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SERVING_NAME : f"pt_sdk_deployment_{datetime.utcnow().strftime('%Y_%m_%d_%H%M%S')}"
deployment_details = client.deployments.create(model_id, meta_props)

To get a deployment_id from details, use id from metadata.

deployment_id = deployment_details['metadata']['id']

You can directly query generate_text using the deployments module.

    prompt="Example prompt",

Creating ModelInference instance#

At the beginning, it is recommended to define parameters (later used by module).

from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

generate_params = {
    GenParams.MAX_NEW_TOKENS: 25,
    GenParams.STOP_SEQUENCES: ["\n"]

Create the ModelInference itself, using credentials and project_id / space_id or the previously initialized APIClient (see APIClient initialization).

from ibm_watson_machine_learning.foundation_models import ModelInference

tuned_model = ModelInference(

# OR

tuned_model = ModelInference(

You can directly query generate_text using the ModelInference object.

tuned_model.generate_text(prompt="Example prompt")

Importing data#

To use ModelInference, an example data may be need.

import pandas as pd

filename = 'car_rental_prompt_tuning_testing_data.json'

url = "https://raw.github.com/IBM/watson-machine-learning-samples/master/cloud/data/prompt_tuning/car_rental_prompt_tuning_testing_data.json"
if not os.path.isfile(filename):

data = pd.read_json(filename)

Analyzing satisfaction#


The satisfaction analysis was performed for a specific example - car rental, it may not work in the case of other data sets.

To analyze satisfaction prepare batch with prompts, calculate the accuracy of tuned model and compare it with base model.

prompts = list(data.input)
satisfaction = list(data.output)
prompts_batch = ["\n".join([prompt]) for prompt in prompts]

Calculate accuracy of based model:

from sklearn.metrics import accuracy_score, f1_score

base_model = ModelInference(
base_model_results = base_model.generate_text(prompt=prompts_batch)
print(f'base model accuracy_score: {accuracy_score(satisfaction, [int(x) for x in base_model_results])}, base model f1_score: {f1_score(satisfaction, [int(x) for x in base_model_results])}')
'base model accuracy_score: 0.965034965034965, base model f1_score: 0.9765258215962441'

Calculate accuracy of tuned model:

tuned_model_results = tuned_model.generate_text(prompt=prompts_batch)
print(f'accuracy_score: {accuracy_score(satisfaction, [int(x) for x in tuned_model_results])}, f1_score: {f1_score(satisfaction, [int(x) for x in tuned_model_results])}')
'accuracy_score: 0.972027972027972, f1_score: 0.9811320754716981'

Generate methods#

The detailed explanation of available generate methods with exact parameters can be found in the ModelInferece class.

With previously created tuned_model object, it is possible to generate a text stream (generator) using defined inference and generate_text_stream() method.

for token in tuned_model.generate_text_stream(prompt=input_prompt):
    print(token, end="")
'$10 Powerchill Leggings'

And also receive more detailed result with generate().

details = tuned_model.generate(prompt=input_prompt, params=gen_params)
    'model_id': 'google/flan-t5-xl',
    'created_at': '2023-11-17T15:32:57.401Z',
    'results': [
        'generated_text': '$10 Powerchill Leggings',
        'generated_token_count': 8,
        'input_token_count': 73,
        'stop_reason': 'eos_token'
    'system': {'warnings': []}