Environments#
An environment defines the execution context for running flows. It specifies the engine to be used, the libraries to be installed, and additional runtime parameters such as the number of CPU cores, memory allocation, and the maximum number of concurrent flow runs.
When configuring an environment, a command is generated that can be used to launch the environment. This command typically runs a Docker container with the selected engine version, all required libraries, and the specified parameters.
Creating an Environment#
In the UI, you can create a new Environment by navigating to Manage -> StreamSets -> New environment.

You will be prompted to provide a Name and select a Data Collector engine version, and other additional configuration options.

In the SDK, you can create an environment from a Project
object using the
Project.create_environment()
method.
At a minimum, you must provide the name
and engine_version
parameters.
All other parameters are optional and will be populated with default values if not specified.
This method returns a Environment
object.
>>> environment = project.create_environment(
... name="Sample",
... engine_version="6.3.0-SNAPSHOT",
... # Optional parameters - see API Reference for more optional parameters
... description="Basic env.",
... stage_libs=[
... 'streamsets-datacollector-basic-lib',
... 'streamsets-datacollector-dataformats-lib',
... 'streamsets-datacollector-dev-lib'
... ],
... cpus_to_allocate=2,
... )
>>> environment
Environment(name='Sample', description='Basic env.', environment_id='c383b27a-eab8-4214-a6f2-c9eb522d1efb', engine_version='6.3.0-SNAPSHOT')
Note
Retrieving an Existing Environment#
Environments can be retrieved through a Project
object.
You can access all environments associated with a project using the
Project.environments
property.
You can also retrieve single environment using the Project.environments.get()
method,
which requires the environment_id
parameter.
>>> # Get all environments associated with the project
>>> environments = project.environments
>>> environments
[
Environment(name='Sample', description='Basic env.', environment_id='c383b27a-eab8-4214-a6f2-c9eb522d1efb', engine_version='6.3.0-SNAPSHOT'),
Environment(name='Sample2', environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7', engine_version='6.3.0-SNAPSHOT')
]
>>> # Get a single environment by its id
>>> environment = project.environments.get(environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7')
>>> environment
Environment(name='Sample2', environment_id='5f5b9182-fe04-463c-9fa6-af9d24d96da7', engine_version='6.3.0-SNAPSHOT')
Modifying an Environment#
In the UI, you can update or delete an existing Environment by navigating to the Manage -> StreamSets.

Updating an Environment#
Similar to environment creation, environment can also be updated using the Project
object.
First, modify properties of the environment instance, then update it using the
Project.update_environment()
method.
>>> # Modify environment settings
>>> environment.max_memory_used = 80
>>> environment.stage_libs.append('streamsets-datacollector-aws-lib')
>>> # Update the environment on the platform
>>> environment = project.update_environment(environment)
>>> environment.stage_libs
['streamsets-datacollector-basic-lib', 'streamsets-datacollector-dataformats-lib', 'streamsets-datacollector-dev-lib', 'streamsets-datacollector-aws-lib']
Deleting an Environment#
To remove an environment, use the
Project.delete_environment()
method.
The delete method returns an API response, which you can insepct to verify the status code.
>>> response = project.delete_environment(environment)
>>> response
<Response [200]>
Retrieving the Engine Installation Command#
To start and run the Engine defined by an Environment, you’ll need to retrieve the installation command for that Environment and execute it from the machine where you want the Engine to run.
In the UI, you can retrieve the run command by navigating to the Manage -> StreamSets.

You can retrieve the run command via the Environment
object by using
Environment.get_installation_command()
method.
>>> installation_command = environment.get_installation_command(
... # Optional parameters
... pretty=False,
... foreground=False
... )
>>> installation_command
'docker run -d --restart on-failure --cpus=4.0 --hostname "$(hostname)" -p 18630:18630 -e SSET_PROJECT_ID=b127c19e-951d-4db6-944c-9850747d0c02 -e SSET_ENVIRONMENT_ID=a0df9d94-625f-411d-88c1-736d38c67a8f -e SSET_BASE_URL=https://api.dai.dev.cloud.ibm.com -e SSET_API_KEY="${SSET_API_KEY:?Please provide your API key from IBM Cloud}" -e SSET_IAM_URL=https://iam.test.cloud.ibm.com icr.io/sx-ci/datacollector:6.3.0-SNAPSHOT' # pragma: allowlist secret
Note
Please be aware that the installation command you retrieve from the Environment requires the SSET_API_KEY
environment variable to be set for the user executing the command.
The environment variable should contain the API key you generated for authenticating with the watsonx.data integration platform.
Retrieving Available Engine Versions#
To view the list of available Engine versions, use the
Platform.available_engine_versions
property.
This returns a list of Engine version names that can be used when configuring an Environment.
>>> engine_versions = platform.available_engine_versions
>>> engine_versions
["6.3.0-SNAPSHOT", "JDK17_6.3.0-latest", ...]
Retrieving Engine Version Details#
In the UI, you can view the list of libraries supported by a specific Engine version during Environment configuration:

In the SDK, to retrieve the list of libraries supported by a given Engine version, use the
Platform.get_engine_version_info()
method.
This method takes the engine version name as an argument and returns detailed information about that version, including its supported libraries.
>>> engine_info = platform.get_engine_version_info("6.3.0-SNAPSHOT")
>>> engine_info
{
'engine_version_id': '6.3.0-SNAPSHOT',
'image_tag': '6.3.0-SNAPSHOT',
'engine_type': 'data_collector',
'stage_libs': [
{
'stage_lib_id': 'streamsets-datacollector-apache-kafka-lib',
'label': 'Apache Kafka',
'image_location': ...,
'stages': ...
},
...
]
}
>>> # To see all stage library IDs that can be used in environment.stage_libs
>>> [lib['stage_lib_id'] for lib in engine_info.get('stage_libs')]
[
'streamsets-datacollector-apache-kafka-lib',
'streamsets-datacollector-postgres-aurora-lib',
'streamsets-datacollector-google-secret-manager-credentialstore-lib',
'streamsets-datacollector-redis-lib',
...
]