V2 Migration Guide¶

On November 22nd, 2023, the API (v2) was announced with the following changes.

New interface design that can support more modalities (code, image, audio) and tasks (chat, classification, transcription). We have worked on the interfaces together with watsonx.ai and OpenShift AI.
Closer alignment with watsonx.ai, so incubating new features in BAM is technically feasible.
Unified experience across REST API, SDKs, CLI and docs.
New SDK design that allows extensibility and reusability across teams and is more intuitive.
Ability to introduce minor breaking changes without affecting users.

This new design approach lets us rewrite the SDK from scratch and align it more towards the new API. In V2, we have introduced the following features.

Unify methods naming (no more generate, generate_as_completed, generate_async and so on).
SDK is always up-to date with the latest available API version.
Automatically handles concurrency and rate-limiting for all endpoints without any additional settings. However, one can explicitly set a custom concurrency limit (generate / embeddings) or batch size (tokenization).
Add implementation for every endpoint that exists on the API (generation limits, generation feedback, prompts, moderations, …).
Improve overall speed by re-using HTTP clients, improving concurrency handling and utilising API processing power.
Automatically generate request/output pydantic data models (data models are always up to date).
BAM API is a new default environment.

What has changed?¶

All responses that used to contain the results field of object type have gotten the field renamed to result.
totalCount param in paginated responses is renamed to total_count.
Most methods return the whole response instead of some of its subfield.
When you see in examples dedicated classes for parameters like TextGenerationParameters, you can always pass a dictionary which will be converted to the class under the hood; same applies to the enums.
Errors are raised immediately instead of being swallowed (can be changed via an appropriate parameter).
The Credentials class throws when an invalid endpoint is passed.
The majority of schemas were renamed (GenerateParams -> TextGenerationParameters, …); for instance, if you work with text generation service (client.text.generation.create), all schemas can be found in genai.features.text.generation, this analogy applies to every other service.
tqdm package has been removed as we think it should not be part of the core layer. One can easily use it by wrapping the given SDK function.
Model class has been replaced with a more general Client, an entry point for all services.
Options class has been removed, as every parameter is unpacked at the method level.

Text Generation ¶

How to replace `generate`/`generate_as_completed`?¶

Old way:

from genai import Credentials, Model
from genai.schemas import GenerateParams

credentials = Credentials.from_env()
parameters = GenerateParams(max_new_tokens=10)

model = Model("google/flan-ul2", params=parameters, credentials=credentials)
results = model.generate(["What is IBM?"]) # or model.generate_as_completed(["What is IBM?"])
print(f"Generated Text: {results[0].generated_text}")

New way:

from genai import Credentials, Client
from genai.text.generation import TextGenerationParameters

credentials = Credentials.from_env()
parameters = TextGenerationParameters(max_new_tokens=10)

client = Client(credentials=credentials)
responses = list(client.text.generation.create(model_id="google/flan-ul2", inputs=["What is IBM?"]))
print(f"Generated Text: {responses[0].results[0].generated_text}")

You can see that the newer way is more typing, but you can retrieve top-level information like: id, created_at, …

Streaming ¶

Old way:

from genai import Credentials, Model
from genai.schemas import GenerateParams

credentials = Credentials.from_env()
parameters = GenerateParams(streaming=True, max_new_tokens=30)

model = Model("google/flan-ul2", params=parameters, credentials=credentials)
for response in model.generate(["What is IBM?"], raw_response=True):
    print(response)

New way:

from genai import Credentials, Client
from genai.text.generation import TextGenerationParameters

credentials = Credentials.from_env()
parameters = TextGenerationParameters(max_new_tokens=30)

client = Client(credentials=credentials)
for response in client.text.generation.create_stream(model_id="google/flan-ul2", input="What is IBM?"):
    print(response)

Notes

stream parameter is replaced by using method create_stream.

How to replace `generate_async`?¶

The old generate_async method has worked by sending multiple requests asynchronously (it spawns a new thread and runs an event loop). This is now a default behaviour for the create method in GenerationService (client.text.generation.create).

from tqdm.auto import tqdm
from genai import Client, Credentials

credentials = Credentials.from_env()
client = Client(credentials=credentials)
prompts = ["Prompt A", "Prompt B", "..."]

for response in tqdm(
    total=len(prompts),
    desc="Progress",
    unit=" inputs",
    iterable=client.text.generation.create(
        model_id="google/flan-ul2",
        inputs=prompts
    )
):
    print(f"Response ID: {response.id}")
    print(response.results)

Notes

max_concurrency_limit/callback parameters are now located under execution_options parameter.
options parameter has been removed; every possible request parameter is now being parameter of the function; for instance: in previous version prompt_id had to be part of options parameter, now prompt_id is a standalone function parameter.
results are now automatically in-order (ordered=True), old behaviour was ordered=False/
throw_on_error is by default set to True (old behaviour - set to False by default). In case of True, you will never receive a None as a response.
return_raw_response parameter was removed, the raw response is now returned automatically (this is why you need to write response.results[0].generated_text instead of response.generated_text; although it may seem more complex it’s more robust because you will never lose any information contained at the top-level).
tqdm progressbar together with hide_progressbar property has been removed; you now have to use tqdm in your own (see example above).

Tokenization ¶

Similarly to generation related unification; tokenization service provides a single create method, which does the heavy lifting for you. With the new API, we have decided to remove constraints on the input items length; however, HTTP payload size and rate limiting are still there and new SDK takes care of it by ensuring that input items are dynamically chunked based on their byte size and by user-provided limit (if provided). So it’s up to you if you have any limitations on the input size.

How to replace `tokenize` / `tokenize_as_completed` / `tokenize_async`?¶

Old way:

from genai import Credentials, Model
from genai.schemas import GenerateParams

credentials = Credentials.from_env()
model = Model("google/flan-ul2", params=GenerateParams(max_new_tokens=20), credentials=credentials)
prompts = ["What is IBM?"] * 100

for response in model.tokenize_async(prompts, return_tokens=True, ordered=True):
    print(response.results)

New way:

from genai import Client, Credentials
from genai.text.tokenization import TextTokenizationParameters, CreateExecutionOptions, TextTokenizationReturnOptions

credentials = Credentials.from_env()
client = Client(credentials=credentials)
prompts = ["What is IBM?"] * 100

for response in client.text.tokenization.create(
    model_id="google/flan-ul2",
    input=prompts,
    parameters=TextTokenizationParameters(
       return_options=TextTokenizationReturnOptions(
             tokens=True,  # return tokens
       )
    ),
    execution_options=CreateExecutionOptions(
       ordered=True,
       batch_size=5,  # (optional) every HTTP request will contain maximally requests,
       concurrency_limit=10,  # (optional) maximally 10 requests wil run at the same time
    ),
):
    print(response.results)

Notes

results are now ordered by default
throw_on_error is by default set to True (old behaviour - set to False by default).In case of True, you will never receive a None as a response.
return_tokens/callbacks parameter is now located under parameters.
client.text.tokenization.create returns a generator instead of list, to work with it as a list, just do responses = list(client.text.tokenization.create(...)).
stop_reason enums are changing from SCREAMING_SNAKE_CASE to snake_case (e.g. MAX_TOKENS -> max_tokens), you can use the prepared StopReason enum.

Models ¶

Old way

from genai import Model, Credentials

credentials = Credentials.from_env()
all_models = Model.list(credentials=credentials)

model = Model("google/flan-ul2", credentials=credentials)
detail = model.info() # get info about current model
is_available = model.available() # check if model exists

New way:

from genai import Client, Credentials

credentials = Credentials.from_env()
client = Client(credentials=credentials)

all_models = client.model.list(offset=0, limit=100) # parameters are optional
detail = client.model.retrieve("google/flan-ul2")
is_available = True # model exists otherwise previous line would throw an exception

Notes

Client throws an exception when a model does not exist instead of returning None.
Client always returns the whole response instead of the response results.
Pagination has been added.

Files ¶

Old way

from genai import Model, Credentials
from genai.services import FileManager
from genai.schemas import FileListParams

credentials = Credentials.from_env()

file_list = FileManager.list_files(credentials=credentials, params=FileListParams(offset=0, limit=5))
file_metadata = FileManager.file_metadata(credentials=credentials, file_id="id")
file_content = FileManager.read_file(credentials=credentials, file_id="id")
uploaded_file = FileManager.upload_file(credentials=credentials, file_path="path_on_your_system", purpose="tune")
FileManager.delete_file(credentials=credentials, file_id="id")

New way:

from genai import Client, Credentials
from genai.file import FilePurpose

credentials = Credentials.from_env()
client = Client(credentials=credentials)

file_list = client.file.list(offset=0, limit=5) # you can pass way more filters
file_metadata = client.file.retrieve("id")
file_content = client.file.read("id")
uploaded_file = client.file.create(file_path="path_on_your_system", purpose=FilePurpose.TUNE) # or just purpose="tune"
client.file.delete(credentials=credentials, file_id="id")

Tunes ¶

Old way

from genai import Model, Credentials
from genai.services import TuneManager
from genai.schemas.tunes_params import (
    CreateTuneHyperParams,
    CreateTuneParams,
    DownloadAssetsParams,
    TunesListParams,
)

credentials = Credentials.from_env()

tune_list = TuneManager.list_tunes(credentials=credentials, params=TunesListParams(offset=0, limit=5))
tune_methods = TuneManager.get_tune_methods(credentials=credentials)
tune_detail = TuneManager.get_tune(credentials=credentials, tune_id="id")
tune_content = TuneManager.download_tune_assets(credentials=credentials, params=DownloadAssetsParams(id="tune_id", content="encoder"))
upload_tune = TuneManager.create_tune(credentials=credentials, params=CreateTuneParams(model_id="google/flan-ul2", task_id="generation", name="my tuned model", method_id="pt", parameters=CreateTuneHyperParams(...)))
TuneManager.delete_tune(credentials=credentials, tune_id="id")

# or via `Model` class

model =  Model("google/flan-ul2", params=None, credentials=credentials)
tuned_model = model.tune(
    name="my tuned model",
    method="pt",
    task="generation",
    hyperparameters=CreateTuneHyperParams(...)
)
tuned_model.download(...)
tuned_model.info(...)
tuned_model.delete(...)

New way:

from genai import Client, Credentials
from genai.tune import TuneStatus, TuningType, TuneAssetType

credentials = Credentials.from_env()
client = Client(credentials=credentials)

tune_list = client.tune.list(offset=0, limit=5, status=TuneStatus.COMPLETED) # or just status="completed"
tune_methods = client.tune.types()
tune_detail = client.tune.retrieve("tune_id")
tune_content = client.tune.read(id="tune_id", type=TuneAssetType.LOGS) # or type="logs"
upload_tune = client.tune.create(name="my tuned model", model_id="google/flan-ul2", task_id="generation", tuning_type=TuningType.PROMPT_TUNING) # tuning_type="prompt_tuning"
client.tune.delete("tune_id")

Notes

task is now task_id
method_id is now tuning_type, the list of allowable values has changed (use TuningType enum or values from the documentation; accepted values are changing from pt and mpt to prompt_tuning and multitask_prompt_tuning).
init_method enums are changing from SCREAMING_SNAKE_CASE to snake_case (e.g. RANDOM -> random)
status enums are changing from SCREAMING_SNAKE_CASE to snake_case (e.g. COMPLETED -> completed), you can use the prepared TuneStatus enum.

Prompt Template (Prompt Pattern)¶

The PromptPattern class has been removed as it was a local duplication of the API’s Prompt Templates (Prompts). Prompt Templates have been replaced by the more general Prompts.

See the following example if you want to create a reusable prompt (prompt with a template).

from genai import Client, Credentials

client = Client(credentials=Credentials.from_env())

# Create prompt
prompt_response = client.prompt.create(
    model_id="google/flan-ul2",
    name="greet prompt",
    input="Hello {{name}}, enjoy your flight to {{destination}}!",
    data={"name": "Mr./Mrs.", "destination": "Unknown"}, # optional
)
prompt_id = prompt_response.result.id

# Render prompt via text generation endpoint
generate_response = client.text.generation.create(
    prompt_id=prompt_id,
    data={
        "name": "Alex",
        "destination": "London"
    }
)

# Response: Hello Alex, enjoy your flight to London!
print(f"Response: {next(generate_response).results[0].generated_text}")

History (Requests History)¶

Old way

from genai.credentials import Credentials
from genai.metadata import Metadata
from genai.schemas.history_params import HistoryParams


metadata = Metadata(Credentials.from_env())
params = HistoryParams(
    limit=8,
    offset=0,
    status="SUCCESS",
    origin="API",
)

history_response = metadata.get_history(params)

New way:

from genai import Client, Credentials
from genai.request import RequestStatus, RequestRetrieveOriginParameter

client = Client(credentials=Credentials.from_env())

history_response = client.request.list(
    limit=8,
    offset=0,
    status=RequestStatus.SUCCESS,  # or status="success"
    origin=RequestRetrieveOriginParameter.API,  # or origin="api"
)

Notes

status, origin and endpoint enums are changing from SCREAMING_SNAKE_CASE to snake_case (e.g. SUCCESS -> success). Feel free to use prepared Python enums.
By default, all origins are now returned (as opposed to generate only in v1).
Response object now includes version field describing major and minor version of API used when the request was created.
Requests made under v1 as well as v2 are returned (while v1/requests endpoint returns only v1 requests).

Extensions ¶

Notes

PandasExtension was removed, because the functionality was replaced by API’s prompt templates.
Third party extensions were updated to work with latest versions of the libraries
If you were using local models through a LocalLLMServer, you may need to adjust them to the new parameter and return types.