V2 Migration Guide¶
On November 22nd, 2023, the API (v2) was announced with the following changes.
New interface design that can support more modalities (code, image, audio) and tasks (chat, classification, transcription). We have worked on the interfaces together with watsonx.ai and OpenShift AI.
Closer alignment with watsonx.ai, so incubating new features in BAM is technically feasible.
Unified experience across REST API, SDKs, CLI and docs.
New SDK design that allows extensibility and reusability across teams and is more intuitive.
Ability to introduce minor breaking changes without affecting users.
This new design approach lets us rewrite the SDK from scratch and align it more towards the new API. In V2, we have introduced the following features.
Unify methods naming (no more
generate,generate_as_completed,generate_asyncand so on).SDK is always up-to date with the latest available API version.
Automatically handles concurrency and rate-limiting for all endpoints without any additional settings. However, one can explicitly set a custom concurrency limit (generate / embeddings) or batch size (tokenization).
Add implementation for every endpoint that exists on the API (generation limits, generation feedback, prompts, moderations, …).
Improve overall speed by re-using HTTP clients, improving concurrency handling and utilising API processing power.
Automatically generate request/output pydantic data models (data models are always up to date).
BAM API is a new default environment.
What has changed?¶
All responses that used to contain the
resultsfield of object type have gotten the field renamed toresult.totalCountparam in paginated responses is renamed tototal_count.Most methods return the whole response instead of some of its subfield.
When you see in examples dedicated classes for parameters like
TextGenerationParameters, you can always pass a dictionary which will be converted to the class under the hood; same applies to the enums.Errors are raised immediately instead of being swallowed (can be changed via an appropriate parameter).
The
Credentialsclass throws when an invalid endpoint is passed.The majority of schemas were renamed (
GenerateParams->TextGenerationParameters, …); for instance, if you work with text generation service (client.text.generation.create), all schemas can be found ingenai.features.text.generation, this analogy applies to every other service.tqdmpackage has been removed as we think it should not be part of the core layer. One can easily use it by wrapping the given SDK function (see example).Modelclass has been replaced with a more generalClient, an entry point for all services.Optionsclass has been removed, as every parameter is unpacked at the method level.
Moderations¶
Stigma (ModerationStigma class) has been replaced by Social Bias (ModerationSocialBias class).
Implicit Hate (ModerationImplicitHate class) has been replaced by Social Bias (ModerationSocialBias class).
❌ Old Way
from genai.schema import ModerationHAP
ModerationHAP(input=True, output=True, threshold=0.8)
Text Generation¶
Replacing generate/generate_as_completed¶
❌ Old approach
from genai import Credentials, Model
from genai.schemas import GenerateParams
credentials = Credentials.from_env()
parameters = GenerateParams(max_new_tokens=10)
model = Model("google/flan-ul2", params=parameters, credentials=credentials)
results = model.generate(["What is IBM?"]) # or model.generate_as_completed(["What is IBM?"])
print(f"Generated Text: {results[0].generated_text}")
✅ New approach
from genai import Credentials, Client
from genai.schema import TextGenerationParameters, TextGenerationReturnOptions
credentials = Credentials.from_env()
parameters = TextGenerationParameters(max_new_tokens=10)
client = Client(credentials=credentials)
responses = list(
    client.text.generation.create(
        model_id="google/flan-ul2",
        inputs=["What is IBM?"],
        parameters=parameters,
        # optionally request more details in the output:
        return_options=TextGenerationReturnOptions(generated_tokens=True, token_logprobs=True)
    )
)
print(f"Generated Text: {responses[0].results[0].generated_text}")
You can see that the newer way is more typing, but you can retrieve
top-level information like: id, created_at, …
📝 Notes
Old
generatemethod returns the list of generated responses whereas the newcreatemethod returns a generator
👉 See more Text Generation Examples.
Replacing stream parameter¶
❌ Old approach
from genai import Credentials, Model
from genai.schemas import GenerateParams
credentials = Credentials.from_env()
parameters = GenerateParams(streaming=True, max_new_tokens=30)
model = Model("google/flan-ul2", params=parameters, credentials=credentials)
for response in model.generate(["What is IBM?"], raw_response=True):
    print(response)
✅ New approach
from genai import Credentials, Client
from genai.schema import TextGenerationParameters
credentials = Credentials.from_env()
parameters = TextGenerationParameters(max_new_tokens=30)
client = Client(credentials=credentials)
for response in client.text.generation.create_stream(model_id="google/flan-ul2", input="What is IBM?"):
    print(response)
📝 Notes
streamparameter has been removed; usecreate_streammethod instead.
👉 See more complex Text Generation Streaming Example.
Replacing generate_async¶
The old generate_async method has worked by sending multiple requests asynchronously (it spawns a new thread and runs an event loop). This is now a default behaviour for the create method in GenerationService (client.text.generation.create).
from tqdm.auto import tqdm
from genai import Client, Credentials
credentials = Credentials.from_env()
client = Client(credentials=credentials)
prompts = ["Prompt A", "Prompt B", "..."]
for response in tqdm(
    total=len(prompts),
    desc="Progress",
    unit=" inputs",
    iterable=client.text.generation.create(
        model_id="google/flan-ul2",
        inputs=prompts
    )
):
    print(f"Response ID: {response.id}")
    print(response.results)
📝 Notes
max_concurrency_limit/callbackparameters are now located underexecution_optionsparameter.optionsparameter has been removed; every possible request parameter is now being parameter of the function; for instance: in previous versionprompt_idhad to be part ofoptionsparameter, nowprompt_idis a standalone function parameter.results are now automatically in-order (
ordered=True), old behaviour wasordered=False/throw_on_erroris by default set toTrue(old behaviour - set toFalseby default). In case ofTrue, you will never receive aNoneas a response.return_raw_responseparameter was removed, the raw response is now returned automatically (this is why you need to writeresponse.results[0].generated_textinstead ofresponse.generated_text; although it may seem more complex it’s more robust because you will never lose any information contained at the top-level).tqdmprogressbar together withhide_progressbarproperty has been removed; you now have to usetqdmin your own (see example above).
👉 See more complex Text Generation Example.
Tokenization¶
Similarly to generation related unification; tokenization
service provides a single create method, which does the heavy lifting
for you. With the new API, we have decided to remove constraints on the input
items length; however, HTTP payload size and rate limiting are still
there and new SDK takes care of it by ensuring that input items are
dynamically chunked based on their byte size and by user-provided limit
(if provided). So it’s up to you if you have any limitations on the input
size.
Replacing tokenize / tokenize_as_completed / tokenize_async¶
❌ Old approach
from genai import Credentials, Model
from genai.schemas import GenerateParams
credentials = Credentials.from_env()
model = Model("google/flan-ul2", params=GenerateParams(max_new_tokens=20), credentials=credentials)
prompts = ["What is IBM?"] * 100
for response in model.tokenize_async(prompts, return_tokens=True, ordered=True):
    print(response.results)
✅ New approach
from genai import Client, Credentials
from genai.schema import TextTokenizationParameters, TextTokenizationReturnOptions
from genai.text.tokenization import CreateExecutionOptions
credentials = Credentials.from_env()
client = Client(credentials=credentials)
prompts = ["What is IBM?"] * 100
for response in client.text.tokenization.create(
    model_id="google/flan-ul2",
    input=prompts,
    parameters=TextTokenizationParameters(
       return_options=TextTokenizationReturnOptions(
             tokens=True,  # return tokens
       )
    ),
    execution_options=CreateExecutionOptions(
       ordered=True,
       batch_size=5,  # (optional) every HTTP request will contain maximally requests,
       concurrency_limit=10,  # (optional) maximally 10 requests wil run at the same time
    ),
):
    print(response.results)
📝 Notes
resultsare now ordered by defaultthrow_on_erroris by default set toTrue(old behaviour - set toFalseby default).In case ofTrue, you will never receive aNoneas a response.return_tokens/callbacksparameter is now located underparameters.client.text.tokenization.createreturns ageneratorinstead oflist, to work with it as a list, just doresponses = list(client.text.tokenization.create(...)).stop_reasonenums are changing fromSCREAMING_SNAKE_CASEtosnake_case(e.g.MAX_TOKENS->max_tokens), you can use the preparedStopReasonenum.
👉 See Text Tokenization Example.
Models¶
❌ Old approach
from genai import Model, Credentials
credentials = Credentials.from_env()
all_models = Model.list(credentials=credentials)
model = Model("google/flan-ul2", credentials=credentials)
detail = model.info() # get info about current model
is_available = model.available() # check if model exists
✅ New approach
from genai import Client, Credentials
credentials = Credentials.from_env()
client = Client(credentials=credentials)
all_models = client.model.list(offset=0, limit=100) # parameters are optional
detail = client.model.retrieve("google/flan-ul2")
is_available = True # model exists otherwise previous line would throw an exception
📝 Notes
Client throws an exception when a model does not exist instead of returning
None.Client always returns the whole response instead of the response results.
Pagination has been added.
👉 See Model Example.
Files¶
❌ Old approach
from genai import Model, Credentials
from genai.services import FileManager
from genai.schemas import FileListParams
credentials = Credentials.from_env()
file_list = FileManager.list_files(credentials=credentials, params=FileListParams(offset=0, limit=5))
file_metadata = FileManager.file_metadata(credentials=credentials, file_id="id")
file_content = FileManager.read_file(credentials=credentials, file_id="id")
uploaded_file = FileManager.upload_file(credentials=credentials, file_path="path_on_your_system", purpose="tune")
FileManager.delete_file(credentials=credentials, file_id="id")
✅ New approach
from genai import Client, Credentials
from genai.schema import FilePurpose
credentials = Credentials.from_env()
client = Client(credentials=credentials)
file_list = client.file.list(offset=0, limit=5) # you can pass way more filters
file_metadata = client.file.retrieve("id")
file_content = client.file.read("id")
uploaded_file = client.file.create(file_path="path_on_your_system", purpose=FilePurpose.TUNE) # or just purpose="tune"
client.file.delete(credentials=credentials, file_id="id")
👉 See Files Example.
Tunes¶
❌ Old approach
from genai import Model, Credentials
from genai.services import TuneManager
from genai.schemas.tunes_params import (
    CreateTuneHyperParams,
    CreateTuneParams,
    DownloadAssetsParams,
    TunesListParams,
)
credentials = Credentials.from_env()
tune_list = TuneManager.list_tunes(credentials=credentials, params=TunesListParams(offset=0, limit=5))
tune_methods = TuneManager.get_tune_methods(credentials=credentials)
tune_detail = TuneManager.get_tune(credentials=credentials, tune_id="id")
tune_content = TuneManager.download_tune_assets(credentials=credentials, params=DownloadAssetsParams(id="tune_id", content="encoder"))
upload_tune = TuneManager.create_tune(credentials=credentials, params=CreateTuneParams(model_id="google/flan-ul2", task_id="generation", name="my tuned model", method_id="pt", parameters=CreateTuneHyperParams(...)))
TuneManager.delete_tune(credentials=credentials, tune_id="id")
# or via `Model` class
model =  Model("google/flan-ul2", params=None, credentials=credentials)
tuned_model = model.tune(
    name="my tuned model",
    method="pt",
    task="generation",
    hyperparameters=CreateTuneHyperParams(...)
)
tuned_model.download(...)
tuned_model.info(...)
tuned_model.delete(...)
✅ New approach
from genai import Client, Credentials
from genai.schema import TuneStatus, TuneAssetType
credentials = Credentials.from_env()
client = Client(credentials=credentials)
tune_list = client.tune.list(offset=0, limit=5, status=TuneStatus.COMPLETED) # or just status="completed"
tune_methods = client.tune.types()
tune_detail = client.tune.retrieve("tune_id")
tune_content = client.tune.read(id="tune_id", type=TuneAssetType.LOGS) # or type="logs"
upload_tune = client.tune.create(name="my tuned model", model_id="google/flan-ul2", task_id="generation", tuning_type="prompt_tuning")
client.tune.delete("tune_id")
📝 Notes
taskis nowtask_idmethod_idis nowtuning_type, the list of allowable values has changed (see the documentation for allowed values).init_methodenums are changing fromSCREAMING_SNAKE_CASEtosnake_case(e.g.RANDOM->random)statusenums are changing fromSCREAMING_SNAKE_CASEtosnake_case(e.g.COMPLETED->completed), you can use the preparedTuneStatusenum.
👉 See Tune a Custom Model Example.
Prompt Template (Prompt Pattern)¶
The PromptPattern class has been removed as it was a local
duplication of the API’s Prompt Templates (Prompts). Prompt Templates
have been replaced by the more general Prompts.
See the following example if you want to create a reusable prompt (prompt with a template).
from genai import Client, Credentials
client = Client(credentials=Credentials.from_env())
# Create prompt
prompt_response = client.prompt.create(
    model_id="google/flan-ul2",
    name="greet prompt",
    input="Hello {{name}}, enjoy your flight to {{destination}}!",
    data={"name": "Mr./Mrs.", "destination": "Unknown"}, # optional
)
prompt_id = prompt_response.result.id
# Render prompt via text generation endpoint
generate_response = client.text.generation.create(
    prompt_id=prompt_id,
    data={
        "name": "Alex",
        "destination": "London"
    }
)
# Response: Hello Alex, enjoy your flight to London!
print(f"Response: {next(generate_response).results[0].generated_text}")
History (Requests History)¶
❌ Old approach
from genai.credentials import Credentials
from genai.metadata import Metadata
from genai.schemas.history_params import HistoryParams
metadata = Metadata(Credentials.from_env())
params = HistoryParams(
    limit=8,
    offset=0,
    status="SUCCESS",
    origin="API",
)
history_response = metadata.get_history(params)
✅ New approach
from genai import Client, Credentials
from genai.request import RequestStatus, RequestRetrieveOriginParameter
client = Client(credentials=Credentials.from_env())
history_response = client.request.list(
    limit=8,
    offset=0,
    status=RequestStatus.SUCCESS,  # or status="success"
    origin=RequestRetrieveOriginParameter.API,  # or origin="api"
)
📝 Notes
status,originand endpointenumsare changing fromSCREAMING_SNAKE_CASEtosnake_case(e.g.SUCCESS->success). Feel free to use prepared Python enums.By default, all origins are now returned (as opposed to generate only in v1).
Response object now includes
versionfield describing major and minor version of API used when the request was created.Requests made under v1 as well as v2 are returned (while v1/requests endpoint returns only v1 requests).
👉 See Requests (History) Example.
Extensions¶
📝 Notes
PandasExtensionwas removed, because the functionality was replaced by API’s prompt templates.The
paramsclass attribute has been renamed to parameters (everywhere).The
modelclass attribute has been renamed to model_id (everywhere).Third party extensions were updated to work with latest versions of the libraries.
If you were using local models through a
LocalLLMServer, you may need to adjust them to the new parameter and return types.
👉 See All Extensions Examples.