Environments#

An environment defines the execution context for running flows. It specifies the engine to be used, the libraries to be installed, and additional runtime parameters such as the number of CPU cores, memory allocation, and the maximum number of concurrent flow runs.

When configuring an environment, a command is generated that can be used to launch the environment. This command typically runs a Docker container with the selected engine version, all required libraries, and the specified parameters.

Creating an Environment#

In the UI, you can create a new Environment by navigating to Manage -> StreamSets -> New environment.

Screenshot of the Environment creation form in the UI

You will be prompted to provide a Name and select a Data Collector engine version, and other additional configuration options.

Screenshot of the Environment configuration form in the UI

In the SDK, you can create an environment from a Project object using the Project.create_environment() method. At a minimum, you must provide the name and engine_version parameters. All other parameters are optional and will be populated with default values if not specified. This method returns a Environment object.

>>> environment = project.create_environment(
...     name="Sample",
...     engine_version="6.3.0-SNAPSHOT",
...     # Optional parameters - see API Reference for more optional parameters
...     description="Basic env.",
...     stage_libs=[
...         'streamsets-datacollector-basic-lib',
...         'streamsets-datacollector-dataformats-lib',
...         'streamsets-datacollector-dev-lib'
...         ],
...     cpus_to_allocate=2,
...     )
>>> environment
Environment(name='Sample', description='Basic env.', environment_id='c383b27a-eab8-4214-a6f2-c9eb522d1efb', engine_version='6.3.0-SNAPSHOT')

Note

For information on available Engine versions, see Engines Versions.
For details on supported libraries, see Engine Details.

Retrieving an Existing Environment#

Environments can be retrieved through a Project object. You can access all environments associated with a project using the Project.environments property. You can also retrieve single environment using the Project.environments.get() method, which requires the environment_id parameter.

>>> # Get all environments associated with the project
>>> environments = project.environments
>>> environments
[
    Environment(name='Sample', description='Basic env.', environment_id='c383b27a-eab8-4214-a6f2-c9eb522d1efb', engine_version='6.3.0-SNAPSHOT'),
    Environment(name='Sample2', environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7', engine_version='6.3.0-SNAPSHOT')
]

>>> # Get a single environment by its id
>>> environment = project.environments.get(environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7')
>>> environment
Environment(name='Sample2', environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7', engine_version='6.3.0-SNAPSHOT')

Modifying an Environment#

In the UI, you can update or delete an existing Environment by navigating to the Manage -> StreamSets.

Screenshot of the Environment update form in the UI

Updating an Environment#

Similar to environment creation, environment can also be updated using the Project object. First, modify properties of the environment instance, then update it using the Project.update_environment() method.

>>> # Modify environment settings
>>> environment.max_memory_used = 80
>>> environment.stage_libs.append('streamsets-datacollector-aws-lib')

>>> # Update the environment on the platform
>>> environment = project.update_environment(environment)
>>> environment.stage_libs
['streamsets-datacollector-basic-lib', 'streamsets-datacollector-dataformats-lib', 'streamsets-datacollector-dev-lib', 'streamsets-datacollector-aws-lib']

Deleting an Environment#

To remove an environment, use the Project.delete_environment() method. The delete method returns an API response, which you can insepct to verify the status code.

>>> response = project.delete_environment(environment)
>>> response
<Response [200]>

Retrieving the Engine Installation Command#

To start and run the Engine defined by an Environment, you’ll need to retrieve the installation command for that Environment and execute it from the machine where you want the Engine to run.

In the UI, you can retrieve the run command by navigating to the Manage -> StreamSets.

Screenshot of the Environment update form in the UI

You can retrieve the run command via the Environment object by using Environment.get_installation_command() method.

>>> installation_command = environment.get_installation_command(
...     # Optional parameters
...     pretty=False,
...     foreground=False
...     )
>>> installation_command
'docker run -d --restart on-failure --cpus=4.0 --hostname "$(hostname)" -p 18630:18630 -e SSET_PROJECT_ID=b127c19e-951d-4db6-944c-9850747d0c02 -e SSET_ENVIRONMENT_ID=a0df9d94-625f-411d-88c1-736d38c67a8f -e SSET_BASE_URL=https://api.dai.dev.cloud.ibm.com -e SSET_API_KEY="${SSET_API_KEY:?Please provide your API key from IBM Cloud}" -e SSET_IAM_URL=https://iam.test.cloud.ibm.com icr.io/sx-ci/datacollector:6.3.0-SNAPSHOT'      # pragma: allowlist secret

Note

Please be aware that the installation command you retrieve from the Environment requires the SSET_API_KEY environment variable to be set for the user executing the command. The environment variable should contain the API key you generated for authenticating with the watsonx.data integration platform.

Retrieving Available Engine Versions#

To view the list of available Engine versions, use the Platform.available_engine_versions property. This returns a list of Engine version names that can be used when configuring an Environment.

>>> engine_versions = platform.available_engine_versions
>>> engine_versions
["6.3.0-SNAPSHOT", "JDK17_6.3.0-latest", ...]

Retrieving Engine Version Details#

In the UI, you can view the list of libraries supported by a specific Engine version during Environment configuration:

Screenshot showing supported libraries for an Engine version in the UI

In the SDK, to retrieve the list of libraries supported by a given Engine version, use the Platform.get_engine_version_info() method. This method takes the engine version name as an argument and returns detailed information about that version, including its supported libraries.

>>> engine_info = platform.get_engine_version_info("6.3.0-SNAPSHOT")
>>> engine_info
{
    'engine_version_id': '6.3.0-SNAPSHOT',
    'image_tag': '6.3.0-SNAPSHOT',
    'engine_type': 'data_collector',
    'stage_libs': [
        {
            'stage_lib_id': 'streamsets-datacollector-apache-kafka-lib',
            'label': 'Apache Kafka',
            'image_location': ...,
            'stages': ...
        },
        ...
    ]
}

>>> # To see all stage library IDs that can be used in environment.stage_libs
>>> [lib['stage_lib_id'] for lib in engine_info.get('stage_libs')]
[
    'streamsets-datacollector-apache-kafka-lib',
    'streamsets-datacollector-postgres-aurora-lib',
    'streamsets-datacollector-google-secret-manager-credentialstore-lib',
    'streamsets-datacollector-redis-lib',
    ...
]