Working with AutoAI class and optimizer ======================================= The :ref:`AutoAI experiment class` is responsible for creating experiments and scheduling training. All experiment results are stored automatically in the user-specified Cloud Object Storage (COS). Then the AutoAI feature can fetch the results and provide them directly to the user for further usage. Configure optimizer with one data source ---------------------------------------- For an AutoAI object initialization, you need watsonx.ai credentials (with your API key and URL) and either the ``project_id`` or ``space_id``. .. hint:: You can copy the project_id from the Project's Manage tab (Project -> Manage -> General -> Details). .. code-block:: python from ibm_watsonx_ai.experiment import AutoAI experiment = AutoAI(wx_credentials, space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb' ) pipeline_optimizer = experiment.optimizer( name='test name', desc='test description', prediction_type=AutoAI.PredictionType.BINARY, prediction_column='y', scoring=AutoAI.Metrics.ACCURACY_SCORE, test_size=0.1, max_num_daub_ensembles=1, train_sample_rows_test_size=1., daub_include_only_estimators = [ AutoAI.ClassificationAlgorithms.XGB, AutoAI.ClassificationAlgorithms.LGBM ], cognito_transform_names = [ AutoAI.Transformers.SUM, AutoAI.Transformers.MAX ] ) Configure optimizer for time series forecasting ---------------------------------------------------------------- **Note:** Supported for IBM Cloud PakĀ® for Data 4.0 and later. Time series forecasting is a special AutoAI prediction scenario with specific parameters used to configure forecasting. These parameters include: ``prediction_columns``, ``timestamp_column_name``, ``backtest_num``, ``lookback_window``, ``forecast_window``, and ``backtest_gap_length``. .. code-block:: python from ibm_watsonx_ai.experiment import AutoAI experiment = AutoAI(wx_credentials, space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb') ) pipeline_optimizer = experiment.optimizer( name='forecasting optimiser', desc='description', prediction_type=experiment.PredictionType.FORECASTING, prediction_columns=['value'], timestamp_column_name='timestamp', backtest_num=4, lookback_window=5, forecast_window=2, holdout_size=0.05, max_number_of_estimators=1, include_only_estimators=[AutoAI.ForecastingAlgorithms.ENSEMBLER], t_shirt_size=TShirtSize.L ) Optimizer and deployment fitting procedures are the same as in the example scenario above. Configure optimizer for time series forecasting with supporting features ----------------------------------------------------------------------------------------- **Note:** Supported for IBM Cloud and IBM Cloud PakĀ® for Data version 4.5 and later. Additional parameters can be passed to run time series forecasting scenarios with supporting features: ``feature_columns``, ``pipeline_types``, ``supporting_features_at_forecast`` For more information about supporting features, refer to `time series documentation `_. .. code-block:: python from ibm_watsonx_ai.experiment import AutoAI from ibm_watsonx_ai.utils.autoai.enums import ForecastingPipelineTypes experiment = AutoAI(wx_credentials, space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb') ) pipeline_optimizer = experiment.optimizer( name='forecasting optimizer', desc='description', prediction_type=experiment.PredictionType.FORECASTING, prediction_columns=['value'], timestamp_column_name='week', feature_columns=['a', 'b', 'value'], pipeline_types=[ForecastingPipelineTypes.FlattenEnsembler] + ForecastingPipelineTypes.get_exogenous(), supporting_features_at_forecast=True ) Predicting for time series forecasting scenario with supporting features: .. code-block:: python # Example data: # new_observations: # week a b value # 14.0 0 0 134 # 15.0 1 4 96 # ... # # supporting_features: # week a b # 16.0 1 3 # ... # with DataFrame or np.array: pipeline_optimizer.predict(new_observations, supporting_features=supporting_features) Online scoring for time series forecasting scenario with supporting features: .. code-block:: python # with DataFrame: web_service.score(payload={'observations': new_observations_df, 'supporting_features': supporting_features_df}) Batch scoring for time series forecasting scenario with supporting features: .. code-block:: python # with DataFrame: batch_service.run_job(payload={'observations': new_observations_df, 'supporting_features': supporting_features_df}) # with DataConnection: batch_service.run_job(payload={'observations': new_observations_data_connection, 'supporting_features': supporting_features_data_connection}) Get configuration parameters ---------------------------- To see the current configuration parameters, call the ``get_params()`` method. .. code-block:: python config_parameters = pipeline_optimizer.get_params() print(config_parameters) { 'name': 'test name', 'desc': 'test description', 'prediction_type': 'classification', 'prediction_column': 'y', 'scoring': 'roc_auc', 'test_size': 0.1, 'max_num_daub_ensembles': 1 } Fit optimizer ------------- To schedule an AutoAI experiment, call the ``fit()`` method. This will trigger a training and an optimization process on watsonx.ai. The ``fit()`` method can be synchronous (``background_mode=False``) or asynchronous (``background_mode=True``). If you don't want to wait for the fit to end, invoke the async version. The async version immediately returns only the fit/run details. If you invoke the async version, you see a progress bar with information about the learning/optimization process. .. code-block:: python fit_details = pipeline_optimizer.fit( training_data_references=[training_data_connection], training_results_reference=results_connection, background_mode=True) # OR fit_details = pipeline_optimizer.fit( training_data_references=[training_data_connection], training_results_reference=results_connection, background_mode=False) To run an AutoAI experiment with separate holdout data you can use the ``fit()`` method with the ``test_data_references`` parameter. See the example below: .. code-block:: python fit_details = pipeline_optimizer.fit( training_data_references=[training_data_connection], test_data_references=[test_data_connection], training_results_reference=results_connection) Get the run status and run details ---------------------------------- If you use the ``fit()`` method asynchronously, you can monitor the run/fit details and status using the following two methods: .. code-block:: python status = pipeline_optimizer.get_run_status() print(status) 'running' # OR 'completed' run_details = pipeline_optimizer.get_run_details() print(run_details) {'entity': {'pipeline': {'href': '/v4/pipelines/5bfeb4c5-90df-48b8-9e03-ba232d8c0838'}, 'results_reference': {'connection': { 'id': ...}, 'location': {'bucket': '...', 'logs': '53c8cb7b-c8b5-44aa-8b52-6fde3c588462', 'model': '53c8cb7b-c8b5-44aa-8b52-6fde3c588462/model', 'path': '.', 'pipeline': './33825fa2-5fca-471a-ab1a-c84820b3e34e/pipeline.json', 'training': './33825fa2-5fca-471a-ab1a-c84820b3e34e', 'training_status': './33825fa2-5fca-471a-ab1a-c84820b3e34e/training-status.json'}, 'type': 'connected_asset'}, 'space': {'href': '/v4/spaces/71ab11ea-bb77-4ae6-b98a-a77f30ade09d'}, 'status': {'completed_at': '2020-02-17T10:46:32.962Z', 'message': {'level': 'info', 'text': 'Training job ' '33825fa2-5fca-471a-ab1a-c84820b3e34e ' 'completed'}, 'state': 'completed'}, 'training_data_references': [{'connection': {'id': '...'}, 'location': {'bucket': '...', 'path': '...'}, 'type': 'connected_asset'}]}, 'metadata': {'created_at': '2020-02-17T10:44:22.532Z', 'guid': '33825fa2-5fca-471a-ab1a-c84820b3e34e', 'href': '/v4/trainings/33825fa2-5fca-471a-ab1a-c84820b3e34e', 'id': '33825fa2-5fca-471a-ab1a-c84820b3e34e', 'modified_at': '2020-02-17T10:46:32.987Z'}} Get data connections -------------------- The ``data_connections`` list contains all the training connections that you referenced while calling the ``fit()`` method. .. code-block:: python data_connections = pipeline_optimizer.get_data_connections() Pipeline summary ---------------- It is possible to get a ranking of all the computed pipeline models, sorted based on a scoring metric supplied when configuring the optimizer (``scoring`` parameter). The output type is a ``pandas.DataFrame`` with pipeline names, computation timestamps, machine learning metrics, and the number of enhancements implemented in each of the pipelines. .. code-block:: python results = pipeline_optimizer.summary() print(results) Number of enhancements ... training_f1 Pipeline Name ... Pipeline_4 3 ... 0.555556 Pipeline_3 2 ... 0.554978 Pipeline_2 1 ... 0.503175 Pipeline_1 0 ... 0.529928 Get pipeline details -------------------- To see the pipeline composition's steps and nodes, use the ``get_pipeline_details()`` method. If you leave ``pipeline_name`` empty, the method returns the details of the best computed pipeline. .. code-block:: python pipeline_params = pipeline_optimizer.get_pipeline_details(pipeline_name='Pipeline_1') print(pipeline_params) { 'composition_steps': [ 'TrainingDataset_full_199_16', 'Split_TrainingHoldout', 'TrainingDataset_full_179_16', 'Preprocessor_default', 'DAUB' ], 'pipeline_nodes': [ 'PreprocessingTransformer', 'LogisticRegressionEstimator' ] } Get pipeline ------------ Use the ``get_pipeline()`` method to load a specific pipeline. By default, ``get_pipeline()`` returns a Lale pipeline. For information on Lale pipelines, refer to the `Lale library `_. .. code-block:: python pipeline = pipeline_optimizer.get_pipeline(pipeline_name='Pipeline_4') print(type(pipeline)) 'lale.operators.TrainablePipeline' You can also load a pipeline as a scikit learn (sklearn) pipeline model type. .. code-block:: python pipeline = pipeline_optimizer.get_pipeline(pipeline_name='Pipeline_4', astype=AutoAI.PipelineTypes.SKLEARN) print(type(pipeline)) # Working with deployments ------------------------ This section describes classes that enable you to work with watsonx.ai deployments. .. _working-with-web-service: Web Service ----------- Web Service is an online type of deployment. With Web Service, you can upload and deploy your model to score it through an online web service. You must pass the location where the training was performed using ``source_space_id`` or ``source_project_id``. You can deploy the model to any space or project by providing the ``target_space_id`` or the ``target_project_id``. **Note:** WebService supports only the AutoAI deployment type. .. code-block:: python from ibm_watsonx_ai.deployment import WebService service = WebService(wx_credentials, source_space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb', target_space_id='1234abc1234abc1234abc1234abc1234abcd') ) service.create( experiment_run_id="...", model=model, deployment_name='My new deployment' ) .. _working-with-batch: Batch ----- Batch manages the batch type of deployment. With Batch, you can upload and deploy a model and run a batch deployment job. As with Web Service, you must pass the location where the training was performed using the ``source_space_id`` or the ``source_project_id``. You can deploy the model to any space or project by providing the ``target_space_id`` or the ``target_project_id``. You can provide the input data as a ``pandas.DataFrame``, a data-asset, or a Cloud Object Storage (COS) file. **Note:** Batch supports only the AutoAI deployment type. Example of a batch deployment creation: .. code-block:: python from ibm_watsonx_ai.deployment import Batch service_batch = Batch(wx_credentials, source_space_id='76g53e0-0b32-4a0e-9152-3d50324855ddb') service_batch.create( experiment_run_id="6ce62a02-3e41-4d11-89d1-484c2deaed75", model="Pipeline_4", deployment_name='Batch deployment') Example of a batch job creation with inline data as ``pandas.DataFrame`` type: .. code-block:: python scoring_params = service_batch.run_job( payload=test_X_df, background_mode=False) Example of batch job creation with a COS object: .. code-block:: python from ibm_watsonx_ai.helpers.connections import S3Location, DataConnection connection_details = client.connections.create({ client.connections.ConfigurationMetaNames.NAME: "Connection to COS", client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: client.connections.get_datasource_type_id_by_name('bluemixcloudobjectstorage'), client.connections.ConfigurationMetaNames.PROPERTIES: { 'bucket': 'bucket_name', 'access_key': 'COS access key id', 'secret_key': 'COS secret access key' 'iam_url': 'COS iam url', 'url': 'COS endpoint url' } }) connection_id = client.connections.get_uid(connection_details) payload_reference = DataConnection( connection_asset_id=connection_id, location=S3Location(bucket='bucket_name', # note: COS bucket name where deployment payload dataset is located path='my_path' # note: path within bucket where your deployment payload dataset is located ) ) results_reference = DataConnection( connection_asset_id=connection_id, location=S3Location(bucket='bucket_name', # note: COS bucket name where deployment output should be located path='my_path_where_output_will_be_saved' # note: path within bucket where your deployment output should be located ) ) payload_reference.write("local_path_to_the_batch_payload_csv_file", remote_name="batch_payload_location.csv"]) scoring_params = service_batch.run_job( payload=[payload_reference], output_data_reference=results_reference, background_mode=False) # If background_mode is False, then synchronous mode is started. Otherwise job status need to be monitored. Example of a batch job creation with a data-asset object: .. code-block:: python from ibm_watsonx_ai.helpers.connections import DataConnection, CloudAssetLocation, DeploymentOutputAssetLocation payload_reference = DataConnection(location=CloudAssetLocation(asset_id=asset_id)) results_reference = DataConnection( location=DeploymentOutputAssetLocation(name="batch_output_file_name.csv")) scoring_params = service_batch.run_job( payload=[payload_reference], output_data_reference=results_reference, background_mode=False)