V2 Migration Guide ================== .. contents:: :local: :class: this-will-duplicate-information-and-it-is-still-useful-here On November 22nd, 2023, the API (v2) was announced with the following changes. - New interface design that can support more modalities (code, image, audio) and tasks (chat, classification, transcription). We have worked on the interfaces together with watsonx.ai and OpenShift AI. - Closer alignment with `watsonx.ai `_, so incubating new features in BAM is technically feasible. - Unified experience across REST API, SDKs, CLI and docs. - New SDK design that allows extensibility and reusability across teams and is more intuitive. - Ability to introduce minor breaking changes without affecting users. This new design approach lets us rewrite the SDK from scratch and align it more towards the new API. In V2, we have introduced the following features. - Unify methods naming (no more ``generate``, ``generate_as_completed``, ``generate_async`` and so on). - SDK is always up-to date with the latest available API version. - Automatically handles concurrency and rate-limiting for all endpoints without any additional settings. However, one can explicitly set a custom concurrency limit (generate / embeddings) or batch size (tokenization). - Add implementation for every endpoint that exists on the API (generation limits, generation feedback, prompts, moderations, …). - Improve overall speed by re-using HTTP clients, improving concurrency handling and utilising API processing power. - Automatically generate request/output pydantic data models (data models are always up to date). - BAM API is a new default environment. What has changed? ----------------- - All responses that used to contain the ``results`` field of object type have gotten the field renamed to ``result``. - ``totalCount`` param in paginated responses is renamed to ``total_count``. - Most methods return the whole response instead of some of its subfield. - When you see in examples dedicated classes for parameters like ``TextGenerationParameters``, you can always pass a dictionary which will be converted to the class under the hood; same applies to the enums. - Errors are raised immediately instead of being swallowed (can be changed via an appropriate parameter). - The ``Credentials`` class throws when an invalid endpoint is passed. - The majority of schemas were renamed (``GenerateParams`` -> ``TextGenerationParameters``, …); for instance, if you work with text generation service (``client.text.generation.create``), all schemas can be found in ``genai.features.text.generation``, this analogy applies to every other service. - ``tqdm`` package has been removed as we think it should not be part of the core layer. One can easily use it by wrapping the given SDK function (see :ref:`example `). - ``Model`` class has been replaced with a more general ``Client``, an entry point for all services. - ``Options`` class has been removed, as every parameter is unpacked at the method level. Moderations ----------- - Stigma (`ModerationStigma` class) has been replaced by Social Bias (`ModerationSocialBias` class). - Implicit Hate (`ModerationImplicitHate` class) has been replaced by Social Bias (`ModerationSocialBias` class). ❌ Old Way .. code:: python from genai.schema import ModerationHAP ModerationHAP(input=True, output=True, threshold=0.8) Text Generation --------------- Replacing ``generate``/``generate_as_completed`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ❌ Old approach .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() parameters = GenerateParams(max_new_tokens=10) model = Model("google/flan-ul2", params=parameters, credentials=credentials) results = model.generate(["What is IBM?"]) # or model.generate_as_completed(["What is IBM?"]) print(f"Generated Text: {results[0].generated_text}") ✅ New approach .. code:: python from genai import Credentials, Client from genai.schema import TextGenerationParameters, TextGenerationReturnOptions credentials = Credentials.from_env() parameters = TextGenerationParameters(max_new_tokens=10) client = Client(credentials=credentials) responses = list( client.text.generation.create( model_id="google/flan-ul2", inputs=["What is IBM?"], parameters=parameters, # optionally request more details in the output: return_options=TextGenerationReturnOptions(generated_tokens=True, token_logprobs=True) ) ) print(f"Generated Text: {responses[0].results[0].generated_text}") You can see that the newer way is more typing, but you can retrieve top-level information like: ``id``, ``created_at``, … 📝 Notes - Old ``generate`` method returns the list of generated responses whereas the new ``create`` method returns a generator 👉 See more :ref:`Text Generation Examples `. Replacing `stream` parameter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ❌ Old approach .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() parameters = GenerateParams(streaming=True, max_new_tokens=30) model = Model("google/flan-ul2", params=parameters, credentials=credentials) for response in model.generate(["What is IBM?"], raw_response=True): print(response) ✅ New approach .. code:: python from genai import Credentials, Client from genai.schema import TextGenerationParameters credentials = Credentials.from_env() parameters = TextGenerationParameters(max_new_tokens=30) client = Client(credentials=credentials) for response in client.text.generation.create_stream(model_id="google/flan-ul2", input="What is IBM?"): print(response) 📝 Notes - ``stream`` parameter has been removed; use ``create_stream`` method instead. 👉 See more complex :ref:`Text Generation Streaming Example `. Replacing ``generate_async`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The old ``generate_async`` method has worked by sending multiple requests asynchronously (it spawns a new thread and runs an event loop). This is now a default behaviour for the ``create`` method in ``GenerationService`` (``client.text.generation.create``). .. code:: python from tqdm.auto import tqdm from genai import Client, Credentials credentials = Credentials.from_env() client = Client(credentials=credentials) prompts = ["Prompt A", "Prompt B", "..."] for response in tqdm( total=len(prompts), desc="Progress", unit=" inputs", iterable=client.text.generation.create( model_id="google/flan-ul2", inputs=prompts ) ): print(f"Response ID: {response.id}") print(response.results) 📝 Notes - ``max_concurrency_limit``/``callback`` parameters are now located under ``execution_options`` parameter. - ``options`` parameter has been removed; every possible request parameter is now being parameter of the function; for instance: in previous version ``prompt_id`` had to be part of ``options`` parameter, now ``prompt_id`` is a standalone function parameter. - results are now automatically in-order (``ordered=True``), old behaviour was ``ordered=False``/ - ``throw_on_error`` is by default set to ``True`` (old behaviour - set to ``False`` by default). In case of ``True``, you will never receive a ``None`` as a response. - ``return_raw_response`` parameter was removed, the raw response is now returned automatically (this is why you need to write ``response.results[0].generated_text`` instead of ``response.generated_text``; although it may seem more complex it’s more robust because you will never lose any information contained at the top-level). - ``tqdm`` progressbar together with ``hide_progressbar`` property has been removed; you now have to use ``tqdm`` in your own (see example above). 👉 See more complex :ref:`Text Generation Example `. Tokenization ------------ Similarly to ``generation`` related unification; ``tokenization`` service provides a single ``create`` method, which does the heavy lifting for you. With the new API, we have decided to remove constraints on the input items length; however, HTTP payload size and rate limiting are still there and new SDK takes care of it by ensuring that input items are dynamically chunked based on their byte size and by user-provided limit (if provided). So it’s up to you if you have any limitations on the input size. Replacing ``tokenize`` / ``tokenize_as_completed`` / ``tokenize_async`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ❌ Old approach .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() model = Model("google/flan-ul2", params=GenerateParams(max_new_tokens=20), credentials=credentials) prompts = ["What is IBM?"] * 100 for response in model.tokenize_async(prompts, return_tokens=True, ordered=True): print(response.results) ✅ New approach .. code:: python from genai import Client, Credentials from genai.schema import TextTokenizationParameters, TextTokenizationReturnOptions from genai.text.tokenization import CreateExecutionOptions credentials = Credentials.from_env() client = Client(credentials=credentials) prompts = ["What is IBM?"] * 100 for response in client.text.tokenization.create( model_id="google/flan-ul2", input=prompts, parameters=TextTokenizationParameters( return_options=TextTokenizationReturnOptions( tokens=True, # return tokens ) ), execution_options=CreateExecutionOptions( ordered=True, batch_size=5, # (optional) every HTTP request will contain maximally requests, concurrency_limit=10, # (optional) maximally 10 requests wil run at the same time ), ): print(response.results) 📝 Notes - ``results`` are now ordered by default - ``throw_on_error`` is by default set to ``True`` (old behaviour - set to ``False`` by default).In case of ``True``, you will never receive a ``None`` as a response. - ``return_tokens``/``callbacks`` parameter is now located under ``parameters``. - ``client.text.tokenization.create`` returns a ``generator`` instead of ``list``, to work with it as a list, just do ``responses = list(client.text.tokenization.create(...))``. - ``stop_reason`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``MAX_TOKENS`` -> ``max_tokens``), you can use the prepared ``StopReason`` enum. 👉 See :ref:`Text Tokenization Example `. Models ------ ❌ Old approach .. code:: python from genai import Model, Credentials credentials = Credentials.from_env() all_models = Model.list(credentials=credentials) model = Model("google/flan-ul2", credentials=credentials) detail = model.info() # get info about current model is_available = model.available() # check if model exists ✅ New approach .. code:: python from genai import Client, Credentials credentials = Credentials.from_env() client = Client(credentials=credentials) all_models = client.model.list(offset=0, limit=100) # parameters are optional detail = client.model.retrieve("google/flan-ul2") is_available = True # model exists otherwise previous line would throw an exception 📝 Notes - Client throws an exception when a model does not exist instead of returning ``None``. - Client always returns the whole response instead of the response results. - Pagination has been added. 👉 See :ref:`Model Example `. Files ----- ❌ Old approach .. code:: python from genai import Model, Credentials from genai.services import FileManager from genai.schemas import FileListParams credentials = Credentials.from_env() file_list = FileManager.list_files(credentials=credentials, params=FileListParams(offset=0, limit=5)) file_metadata = FileManager.file_metadata(credentials=credentials, file_id="id") file_content = FileManager.read_file(credentials=credentials, file_id="id") uploaded_file = FileManager.upload_file(credentials=credentials, file_path="path_on_your_system", purpose="tune") FileManager.delete_file(credentials=credentials, file_id="id") ✅ New approach .. code:: python from genai import Client, Credentials from genai.schema import FilePurpose credentials = Credentials.from_env() client = Client(credentials=credentials) file_list = client.file.list(offset=0, limit=5) # you can pass way more filters file_metadata = client.file.retrieve("id") file_content = client.file.read("id") uploaded_file = client.file.create(file_path="path_on_your_system", purpose=FilePurpose.TUNE) # or just purpose="tune" client.file.delete(credentials=credentials, file_id="id") 👉 See :ref:`Files Example `. Tunes ----- ❌ Old approach .. code:: python from genai import Model, Credentials from genai.services import TuneManager from genai.schemas.tunes_params import ( CreateTuneHyperParams, CreateTuneParams, DownloadAssetsParams, TunesListParams, ) credentials = Credentials.from_env() tune_list = TuneManager.list_tunes(credentials=credentials, params=TunesListParams(offset=0, limit=5)) tune_methods = TuneManager.get_tune_methods(credentials=credentials) tune_detail = TuneManager.get_tune(credentials=credentials, tune_id="id") tune_content = TuneManager.download_tune_assets(credentials=credentials, params=DownloadAssetsParams(id="tune_id", content="encoder")) upload_tune = TuneManager.create_tune(credentials=credentials, params=CreateTuneParams(model_id="google/flan-ul2", task_id="generation", name="my tuned model", method_id="pt", parameters=CreateTuneHyperParams(...))) TuneManager.delete_tune(credentials=credentials, tune_id="id") # or via `Model` class model = Model("google/flan-ul2", params=None, credentials=credentials) tuned_model = model.tune( name="my tuned model", method="pt", task="generation", hyperparameters=CreateTuneHyperParams(...) ) tuned_model.download(...) tuned_model.info(...) tuned_model.delete(...) ✅ New approach .. code:: python from genai import Client, Credentials from genai.schema import TuneStatus, TuneAssetType credentials = Credentials.from_env() client = Client(credentials=credentials) tune_list = client.tune.list(offset=0, limit=5, status=TuneStatus.COMPLETED) # or just status="completed" tune_methods = client.tune.types() tune_detail = client.tune.retrieve("tune_id") tune_content = client.tune.read(id="tune_id", type=TuneAssetType.LOGS) # or type="logs" upload_tune = client.tune.create(name="my tuned model", model_id="google/flan-ul2", task_id="generation", tuning_type="prompt_tuning") client.tune.delete("tune_id") 📝 Notes - ``task`` is now ``task_id`` - ``method_id`` is now ``tuning_type``, the list of allowable values has changed (see the documentation for allowed values). - ``init_method`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``RANDOM`` -> ``random``) - ``status`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``COMPLETED`` -> ``completed``), you can use the prepared ``TuneStatus`` enum. 👉 See :ref:`Tune a Custom Model Example `. Prompt Template (Prompt Pattern) -------------------------------- The ``PromptPattern`` class has been removed as it was a local duplication of the API’s Prompt Templates (Prompts). Prompt Templates have been replaced by the more general ``Prompts``. See the following example if you want to create a reusable prompt (prompt with a template). .. code:: python from genai import Client, Credentials client = Client(credentials=Credentials.from_env()) # Create prompt prompt_response = client.prompt.create( model_id="google/flan-ul2", name="greet prompt", input="Hello {{name}}, enjoy your flight to {{destination}}!", data={"name": "Mr./Mrs.", "destination": "Unknown"}, # optional ) prompt_id = prompt_response.result.id # Render prompt via text generation endpoint generate_response = client.text.generation.create( prompt_id=prompt_id, data={ "name": "Alex", "destination": "London" } ) # Response: Hello Alex, enjoy your flight to London! print(f"Response: {next(generate_response).results[0].generated_text}") 👉 See :ref:`Custom prompt with variables Example `. History (Requests History) -------------------------- ❌ Old approach .. code:: python from genai.credentials import Credentials from genai.metadata import Metadata from genai.schemas.history_params import HistoryParams metadata = Metadata(Credentials.from_env()) params = HistoryParams( limit=8, offset=0, status="SUCCESS", origin="API", ) history_response = metadata.get_history(params) ✅ New approach .. code:: python from genai import Client, Credentials from genai.request import RequestStatus, RequestRetrieveOriginParameter client = Client(credentials=Credentials.from_env()) history_response = client.request.list( limit=8, offset=0, status=RequestStatus.SUCCESS, # or status="success" origin=RequestRetrieveOriginParameter.API, # or origin="api" ) 📝 Notes - ``status``, ``origin`` and endpoint ``enums`` are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``SUCCESS`` -> ``success``). Feel free to use prepared Python enums. - By default, all origins are now returned (as opposed to generate only in v1). - Response object now includes ``version`` field describing major and minor version of API used when the request was created. - Requests made under v1 as well as v2 are returned (while v1/requests endpoint returns only v1 requests). 👉 See :ref:`Requests (History) Example `. Extensions ---------- 📝 Notes - ``PandasExtension`` was removed, because the functionality was replaced by API's prompt templates. - The ``params`` class attribute has been renamed to `parameters` (everywhere). - The ``model`` class attribute has been renamed to `model_id` (everywhere). - Third party extensions were updated to work with latest versions of the libraries. - If you were using local models through a ``LocalLLMServer``, you may need to adjust them to the new parameter and return types. 👉 See :ref:`All Extensions Examples `.