V2 Migration Guide ================== .. contents:: :local: :class: this-will-duplicate-information-and-it-is-still-useful-here On November 22nd, 2023, the API (v2) was announced with the following changes. - New interface design that can support more modalities (code, image, audio) and tasks (chat, classification, transcription). We have worked on the interfaces together with watsonx.ai and OpenShift AI. - Closer alignment with `watsonx.ai `_, so incubating new features in BAM is technically feasible. - Unified experience across REST API, SDKs, CLI and docs. - New SDK design that allows extensibility and reusability across teams and is more intuitive. - Ability to introduce minor breaking changes without affecting users. This new design approach lets us rewrite the SDK from scratch and align it more towards the new API. In V2, we have introduced the following features. - Unify methods naming (no more ``generate``, ``generate_as_completed``, ``generate_async`` and so on). - SDK is always up-to date with the latest available API version. - Automatically handles concurrency and rate-limiting for all endpoints without any additional settings. However, one can explicitly set a custom concurrency limit (generate / embeddings) or batch size (tokenization). - Add implementation for every endpoint that exists on the API (generation limits, generation feedback, prompts, moderations, …). - Improve overall speed by re-using HTTP clients, improving concurrency handling and utilising API processing power. - Automatically generate request/output pydantic data models (data models are always up to date). - BAM API is a new default environment. What has changed? ----------------- - All responses that used to contain the ``results`` field of object type have gotten the field renamed to ``result``. - ``totalCount`` param in paginated responses is renamed to ``total_count``. - Most methods return the whole response instead of some of its subfield. - When you see in examples dedicated classes for parameters like ``TextGenerationParameters``, you can always pass a dictionary which will be converted to the class under the hood; same applies to the enums. - Errors are raised immediately instead of being swallowed (can be changed via an appropriate parameter). - The ``Credentials`` class throws when an invalid endpoint is passed. - The majority of schemas were renamed (``GenerateParams`` -> ``TextGenerationParameters``, …); for instance, if you work with text generation service (``client.text.generation.create``), all schemas can be found in ``genai.features.text.generation``, this analogy applies to every other service. - ``tqdm`` package has been removed as we think it should not be part of the core layer. One can easily use it by wrapping the given SDK function. - ``Model`` class has been replaced with a more general ``Client``, an entry point for all services. - ``Options`` class has been removed, as every parameter is unpacked at the method level. Text Generation ---------------------- How to replace ``generate``/``generate_as_completed``? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Old way: .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() parameters = GenerateParams(max_new_tokens=10) model = Model("google/flan-ul2", params=parameters, credentials=credentials) results = model.generate(["What is IBM?"]) # or model.generate_as_completed(["What is IBM?"]) print(f"Generated Text: {results[0].generated_text}") New way: .. code:: python from genai import Credentials, Client from genai.text.generation import TextGenerationParameters credentials = Credentials.from_env() parameters = TextGenerationParameters(max_new_tokens=10) client = Client(credentials=credentials) responses = list(client.text.generation.create(model_id="google/flan-ul2", inputs=["What is IBM?"])) print(f"Generated Text: {responses[0].results[0].generated_text}") You can see that the newer way is more typing, but you can retrieve top-level information like: ``id``, ``created_at``, … Streaming ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Old way: .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() parameters = GenerateParams(streaming=True, max_new_tokens=30) model = Model("google/flan-ul2", params=parameters, credentials=credentials) for response in model.generate(["What is IBM?"], raw_response=True): print(response) New way: .. code:: python from genai import Credentials, Client from genai.text.generation import TextGenerationParameters credentials = Credentials.from_env() parameters = TextGenerationParameters(max_new_tokens=30) client = Client(credentials=credentials) for response in client.text.generation.create_stream(model_id="google/flan-ul2", input="What is IBM?"): print(response) Notes - ``stream`` parameter is replaced by using method ``create_stream``. How to replace ``generate_async``? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The old ``generate_async`` method has worked by sending multiple requests asynchronously (it spawns a new thread and runs an event loop). This is now a default behaviour for the ``create`` method in ``GenerationService`` (``client.text.generation.create``). .. code:: python from tqdm.auto import tqdm from genai import Client, Credentials credentials = Credentials.from_env() client = Client(credentials=credentials) prompts = ["Prompt A", "Prompt B", "..."] for response in tqdm( total=len(prompts), desc="Progress", unit=" inputs", iterable=client.text.generation.create( model_id="google/flan-ul2", inputs=prompts ) ): print(f"Response ID: {response.id}") print(response.results) Notes - ``max_concurrency_limit``/``callback`` parameters are now located under ``execution_options`` parameter. - ``options`` parameter has been removed; every possible request parameter is now being parameter of the function; for instance: in previous version ``prompt_id`` had to be part of ``options`` parameter, now ``prompt_id`` is a standalone function parameter. - results are now automatically in-order (``ordered=True``), old behaviour was ``ordered=False``/ - ``throw_on_error`` is by default set to ``True`` (old behaviour - set to ``False`` by default). In case of ``True``, you will never receive a ``None`` as a response. - ``return_raw_response`` parameter was removed, the raw response is now returned automatically (this is why you need to write ``response.results[0].generated_text`` instead of ``response.generated_text``; although it may seem more complex it’s more robust because you will never lose any information contained at the top-level). - ``tqdm`` progressbar together with ``hide_progressbar`` property has been removed; you now have to use ``tqdm`` in your own (see example above). Tokenization ------------ Similarly to ``generation`` related unification; ``tokenization`` service provides a single ``create`` method, which does the heavy lifting for you. With the new API, we have decided to remove constraints on the input items length; however, HTTP payload size and rate limiting are still there and new SDK takes care of it by ensuring that input items are dynamically chunked based on their byte size and by user-provided limit (if provided). So it’s up to you if you have any limitations on the input size. How to replace ``tokenize`` / ``tokenize_as_completed`` / ``tokenize_async``? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Old way: .. code:: python from genai import Credentials, Model from genai.schemas import GenerateParams credentials = Credentials.from_env() model = Model("google/flan-ul2", params=GenerateParams(max_new_tokens=20), credentials=credentials) prompts = ["What is IBM?"] * 100 for response in model.tokenize_async(prompts, return_tokens=True, ordered=True): print(response.results) New way: .. code:: python from genai import Client, Credentials from genai.text.tokenization import TextTokenizationParameters, CreateExecutionOptions, TextTokenizationReturnOptions credentials = Credentials.from_env() client = Client(credentials=credentials) prompts = ["What is IBM?"] * 100 for response in client.text.tokenization.create( model_id="google/flan-ul2", input=prompts, parameters=TextTokenizationParameters( return_options=TextTokenizationReturnOptions( tokens=True, # return tokens ) ), execution_options=CreateExecutionOptions( ordered=True, batch_size=5, # (optional) every HTTP request will contain maximally requests, concurrency_limit=10, # (optional) maximally 10 requests wil run at the same time ), ): print(response.results) Notes - results are now ordered by default - ``throw_on_error`` is by default set to ``True`` (old behaviour - set to ``False`` by default).In case of ``True``, you will never receive a ``None`` as a response. - ``return_tokens``/``callbacks`` parameter is now located under ``parameters``. - ``client.text.tokenization.create`` returns a ``generator`` instead of ``list``, to work with it as a list, just do ``responses = list(client.text.tokenization.create(...))``. - ``stop_reason`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``MAX_TOKENS`` -> ``max_tokens``), you can use the prepared ``StopReason`` enum. Models ------ Old way .. code:: python from genai import Model, Credentials credentials = Credentials.from_env() all_models = Model.list(credentials=credentials) model = Model("google/flan-ul2", credentials=credentials) detail = model.info() # get info about current model is_available = model.available() # check if model exists New way: .. code:: python from genai import Client, Credentials credentials = Credentials.from_env() client = Client(credentials=credentials) all_models = client.model.list(offset=0, limit=100) # parameters are optional detail = client.model.retrieve("google/flan-ul2") is_available = True # model exists otherwise previous line would throw an exception Notes - Client throws an exception when a model does not exist instead of returning ``None``. - Client always returns the whole response instead of the response results. - Pagination has been added. Files ----- Old way .. code:: python from genai import Model, Credentials from genai.services import FileManager from genai.schemas import FileListParams credentials = Credentials.from_env() file_list = FileManager.list_files(credentials=credentials, params=FileListParams(offset=0, limit=5)) file_metadata = FileManager.file_metadata(credentials=credentials, file_id="id") file_content = FileManager.read_file(credentials=credentials, file_id="id") uploaded_file = FileManager.upload_file(credentials=credentials, file_path="path_on_your_system", purpose="tune") FileManager.delete_file(credentials=credentials, file_id="id") New way: .. code:: python from genai import Client, Credentials from genai.file import FilePurpose credentials = Credentials.from_env() client = Client(credentials=credentials) file_list = client.file.list(offset=0, limit=5) # you can pass way more filters file_metadata = client.file.retrieve("id") file_content = client.file.read("id") uploaded_file = client.file.create(file_path="path_on_your_system", purpose=FilePurpose.TUNE) # or just purpose="tune" client.file.delete(credentials=credentials, file_id="id") Tunes ----- Old way .. code:: python from genai import Model, Credentials from genai.services import TuneManager from genai.schemas.tunes_params import ( CreateTuneHyperParams, CreateTuneParams, DownloadAssetsParams, TunesListParams, ) credentials = Credentials.from_env() tune_list = TuneManager.list_tunes(credentials=credentials, params=TunesListParams(offset=0, limit=5)) tune_methods = TuneManager.get_tune_methods(credentials=credentials) tune_detail = TuneManager.get_tune(credentials=credentials, tune_id="id") tune_content = TuneManager.download_tune_assets(credentials=credentials, params=DownloadAssetsParams(id="tune_id", content="encoder")) upload_tune = TuneManager.create_tune(credentials=credentials, params=CreateTuneParams(model_id="google/flan-ul2", task_id="generation", name="my tuned model", method_id="pt", parameters=CreateTuneHyperParams(...))) TuneManager.delete_tune(credentials=credentials, tune_id="id") # or via `Model` class model = Model("google/flan-ul2", params=None, credentials=credentials) tuned_model = model.tune( name="my tuned model", method="pt", task="generation", hyperparameters=CreateTuneHyperParams(...) ) tuned_model.download(...) tuned_model.info(...) tuned_model.delete(...) New way: .. code:: python from genai import Client, Credentials from genai.tune import TuneStatus, TuningType, TuneAssetType credentials = Credentials.from_env() client = Client(credentials=credentials) tune_list = client.tune.list(offset=0, limit=5, status=TuneStatus.COMPLETED) # or just status="completed" tune_methods = client.tune.types() tune_detail = client.tune.retrieve("tune_id") tune_content = client.tune.read(id="tune_id", type=TuneAssetType.LOGS) # or type="logs" upload_tune = client.tune.create(name="my tuned model", model_id="google/flan-ul2", task_id="generation", tuning_type=TuningType.PROMPT_TUNING) # tuning_type="prompt_tuning" client.tune.delete("tune_id") Notes - ``task`` is now ``task_id`` - ``method_id`` is now ``tuning_type``, the list of allowable values has changed (use ``TuningType`` enum or values from the documentation; accepted values are changing from ``pt`` and ``mpt`` to ``prompt_tuning`` and ``multitask_prompt_tuning``). - ``init_method`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``RANDOM`` -> ``random``) - ``status`` enums are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``COMPLETED`` -> ``completed``), you can use the prepared ``TuneStatus`` enum. Prompt Template (Prompt Pattern) -------------------------------- The ``PromptPattern`` class has been removed as it was a local duplication of the API’s Prompt Templates (Prompts). Prompt Templates have been replaced by the more general ``Prompts``. See the following example if you want to create a reusable prompt (prompt with a template). .. code:: python from genai import Client, Credentials client = Client(credentials=Credentials.from_env()) # Create prompt prompt_response = client.prompt.create( model_id="google/flan-ul2", name="greet prompt", input="Hello {{name}}, enjoy your flight to {{destination}}!", data={"name": "Mr./Mrs.", "destination": "Unknown"}, # optional ) prompt_id = prompt_response.result.id # Render prompt via text generation endpoint generate_response = client.text.generation.create( prompt_id=prompt_id, data={ "name": "Alex", "destination": "London" } ) # Response: Hello Alex, enjoy your flight to London! print(f"Response: {next(generate_response).results[0].generated_text}") History (Requests History) -------------------------- Old way .. code:: python from genai.credentials import Credentials from genai.metadata import Metadata from genai.schemas.history_params import HistoryParams metadata = Metadata(Credentials.from_env()) params = HistoryParams( limit=8, offset=0, status="SUCCESS", origin="API", ) history_response = metadata.get_history(params) New way: .. code:: python from genai import Client, Credentials from genai.request import RequestStatus, RequestRetrieveOriginParameter client = Client(credentials=Credentials.from_env()) history_response = client.request.list( limit=8, offset=0, status=RequestStatus.SUCCESS, # or status="success" origin=RequestRetrieveOriginParameter.API, # or origin="api" ) Notes - ``status``, ``origin`` and endpoint ``enums`` are changing from ``SCREAMING_SNAKE_CASE`` to ``snake_case`` (e.g. ``SUCCESS`` -> ``success``). Feel free to use prepared Python enums. - By default, all origins are now returned (as opposed to generate only in v1). - Response object now includes ``version`` field describing major and minor version of API used when the request was created. - Requests made under v1 as well as v2 are returned (while v1/requests endpoint returns only v1 requests). Extensions -------------------------- Notes - ``PandasExtension`` was removed, because the functionality was replaced by API's prompt templates. - Third party extensions were updated to work with latest versions of the libraries - If you were using local models through a ``LocalLLMServer``, you may need to adjust them to the new parameter and return types.