Extending `ado` with new Operators

Info

A complete example operator is provided here. This example operator is functional, and useful out of the box. It can be used as the basis to create new operators. It references this document to help tie details here to the implementation.

Developers can write their own operator plugins to add new operations that work on discoveryspaces to ado. Operator plugins are written in Python and can live in their own repository.

The main part of writing an operator plugin, from an integration standpoint, is writing a Python function that implements a specific interface. ado will call this function to execute an operation with your operator. From this function you then call your operator logic (or in many cases it can just live in this function).

This page gives an overview of how to get started creating your own operator. After reading this page the best resource is to check our example operator.

Knowledge required¶

Knowledge of Python
Knowledge of pydantic is useful, but not necessary

`ado` operator functions¶

An operator function is a decorated Python function with a specific signature. To execute your operator ado will call your function and expect it to return output in a given way. Below is an example of such a decorated function. The next sections describe the decorator, its parameters, and the structure of the operation function itself.

from orchestrator.modules.operators.collections import
    characterize_operation  # Import the decorator from this module depending on the type of operation your operator performs


@characterize_operation(
    name="my_operator",  # The name of your operator.
    description="Example operator",  # What this operator does
    configuration_model=MyOperatorOptions,  # A pydantic model that describes your operators input parameters
    configuration_model_default=MyOperatorOptions.default_parameters(),  # An example of your operators input parameters
    version="1.0",  # Version of the operator

)
def detect_anomalous_series(
        discoverySpace: DiscoverySpace,
        operationInfo: typing.Optional[FunctionOperationInfo] = None,
        **parameters,
) -> OperationOutput:
    # Your operation logic - can also call other Python modules etc.
    ...
    return operationOutput

Operator Type¶

The first thing you need to do is decide what type of operator you are creating. The choices are explore, characterize, learn, modify, fuse, export, or compare. You then import the decorator for this operator type from orchestrator.modules.operators.collections and use it to decorate your operator function.

For example, if your operator compares discoveryspaces you would do

from orchestrator.modules.operators.collections import compare_operation

@compare_operation(...)
def my_comparison_operation():

The decorator parameters are the same for all operator/operation types.

Operator function parameters¶

All operator functions take one or more discoveryspaces along with a dictionary containing the inputs for the operation.

If your operation type is explore, characterize, learn or modify, your function should have a parameter discoverySpace i.e.

def detect_anomalous_series(
    discoverySpace: DiscoverySpace,
    operationInfo: typing.Optional[FunctionOperationInfo] = None,
    **parameters,
) -> OperationOutput:
   ...

If it is fuse or compare your function should have a parameter discoverySpaces which is a list of discoveryspaces i.e.

def detect_anomalous_series(
    discoverySpaces: list[DiscoverySpace],
    operationInfo: typing.Optional[FunctionOperationInfo] = None,
    **parameters,
) -> OperationOutput:
   ...

Operator functions also take an optional third parameter, operationInfo, that holds information for ado. You do not have to interact with the parameter unless you are writing an explore operator.

Describing your operation input parameters¶

From the previous section, the parameters variable will contain the parameters values that should be used for a specific operation. However, how does ado know what the valid input parameters are for your operator so the contents of this variable will make sense?

The answer is that the input parameters to your operator are described by a pydantic model that you give to the function decorator. Here's the example from the previous section with the relevant fields called out:

@characterize_operation(
    name="my_operator",
    description="Example operator",
    configuration_model=MyOperatorOptions,  # <- A pydantic model that describes your operators input parameters
    configuration_model_default=MyOperatorOptions(), # <- An example of your operators input parameters
    version="1.0",
)

Here MyOperatorOptions is a pydantic model that describes your operators input parameters. The parameters dictionary that is passed to your operation function will be a dump of this model. So the typical first step in the function is to create the model for your inputs

inputs = MyOperatorOptions.model_validate(parameters)

Providing an example operation configuration¶

The decorators configuration_model_default parameter takes a example of your operators parameters. If your operator's parameter model has defaults for all fields then the simplest approach is to use those as the value of configuration_model_default:

    configuration_model_default=MyOperatorOptions(), # <- This will use the defaults specified for all fields of your operators parameters

How your Operators input parameters model is stored and output¶

When outputting the default options via ado template operator, model_dump_json() is used with no options.

When an operation is created using your Operator, the parameters are stored in the metastore using model_dump_json() with no options.

Operation function logic¶

We've covered how your operator will be called. However, where do you put your code?

If you are not creating an explore operator you can implement as you like e.g. within the operator function or in a class or function in a separate module called from the operator function.

If your operator type involves sampling and measuring entities e.g. it is an optimizer, your code has some additional packaging requirements which are discussed in explore operators.

Returning data from your operation¶

Note

Any ado resources created will be stored in the context the operation was created in.

The operator function must return data using the orchestrators.core.operation.operation.OperationOutput pydantic model.

class OperationOutput(pydantic.BaseModel):
    metadata: typing.Dict = pydantic.Field(
        default={},
        description="Additional metadata about the operation. ",
    )
    resources: typing.List[orchestrator.core.resources.ADOResource] = pydantic.Field(
        default=[],
        description="Array of ADO resources generated by the operation",
    )
    exitStatus: OperationResourceStatus = pydantic.Field(
        description="Exit status of the operation. Default to success if not applied",
    )

The key fields to set are:

resources: A list of ado resources your operation created.
existStatus: Indicates if the operation worked or not

Its expected that certain operation types return certain outputs:

fuse, modify: Expected to return a new DiscoverySpaceResource and optionally a SampleStoreResource
compare: Expected to return a new DataContainerResource
characterize: Expected to return a new DataContainerResource

If you have non-ado resource data you want to return from your operation, for example pandas DataFrames, paths to files, text, lists etc. you can use ados datacontainer resource.

The following code snippet shows returning a dataframe, a dictionary with some key:value pairs, and an URL:

tabular_data = TabularData.from_dataframe(df)
location = ResourceLocation.locationFromURL(someURL)
data_container = DataContainer(tabularData={"main_dataframe":tabular_data},
                               data={"important_dict":results_dict},
                               locationData={"important_location": location})

return OperationOutput(resources=[DataContainerResource(config=data_container)])

How to update your operator input parameters¶

During development, there will be times when you might need to update the input parameter model for your operator, adding, removing or modifying fields. In these cases, it's important not to break backwards compatibility (where possible) while making sure that users are aware of the changes to the model and do not rely indefinitely on the model being auto upgraded.

In ado, we recommend using Pydantic before validators coupled with the ado upgrade command. At a high level, you should:

Use a before validator to create a temporary upgrade path for your model.
Enable a warning in this validator using the provided support functions (described below). This warning will inform users that an upgrade is needed. The support function will automatically print the command to upgrade stored model versions and remove the warning. It will also display a message indicating that auto-upgrade functionality will be removed in a future release.
Remove the upgrade path in the specified future version.

Let's see a practical example. Consider this class as the input parameter class in my_operator v1:

import pydantic

class MyOperatorOptions(pydantic.BaseModel):
    my_parameter_name: int

And consider two cases:

We want to deprecate a field.
We want to apply changes to a field without deprecating it.

Deprecating a field in your operator input parameters¶

Let's imagine we want to change the name of the my_parameter_name field to be my_improved_parameter_name. The model for our operator v2 would then be:

import pydantic

class MyOperatorOptions(pydantic.BaseModel):
    my_improved_parameter_name: int

To enable upgrading of the previous model versions when fields are being deprecated, we recommended using a Pydantic Before Model Validator. This allows the dictionary content of the model to be changed as appropriate before validation is applied. To ensure the users are aware of the change, we will also use the warn_deprecated_operator_parameters_model_in_use method in the validator:

import pydantic

class MyOperatorOptions(pydantic.BaseModel):
    my_improved_parameter_name: int

    @pydantic.model_validator(mode="before")
    @classmethod
    def rename_my_parameter_name(cls, values: dict):
        from orchestrator.modules.operators.base import (
            warn_deprecated_operator_parameters_model_in_use,
        )

        old_key = "my_parameter_name"
        new_key = "my_improved_parameter_name"
        if old_key in values:

            # Notify the user that the my_parameter_name
            # field is deprecated
            warn_deprecated_operator_parameters_model_in_use(
                affected_operator="my_operator",
                deprecated_from_operator_version="v2",
                removed_from_operator_version="v3",
                deprecated_fields=old_key,
                latest_format_documentation_url="https://example.com",
            )

            # The user has set both the old
            # and the new key - the new key
            # takes precedence.
            if new_key in values:
                values.pop(old_key)
            # Set the old value in the
            # new field
            else:
                values[new_key] = values.pop(old_key)

        return values

When a model with the old field will be loaded, the user will see the following warning:

WARN:   The parameters for the my_operator operator have been updated as of my_operator v2.
        They are being temporarily auto-upgraded to the latest version.
        This behavior will be removed with my_operator v3.
HINT:   Run ado upgrade operations to upgrade the stored operations.
        Update your operation YAML files to use the latest format: https://example.com

Updating a field in your operator input parameters without deprecating it¶

Let's imagine we want to change the type of the my_parameter_name field to be str. The model for our operator v2 would then be:

import pydantic

class MyOperatorOptions(pydantic.BaseModel):
    my_parameter_name: str

To enable upgrading of the previous model versions when fields are not being deprecated, we recommended using a Pydantic Before Field Validator. This allows the specific field to be changed as appropriate before validation is applied. To ensure the users are aware of the change, we will also use the warn_deprecated_operator_parameters_model_in_use method in the validator:

Note

The method being called is the same as the one for warning about deprecated fields, but we omit the deprecated_fields parameter.

import pydantic

class MyOperatorOptions(pydantic.BaseModel):
    my_parameter_name: str

    @pydantic.field_validator("my_parameter_name", mode="before")
    @classmethod
    def convert_my_parameter_name_to_string(cls, value: int | str):
        from orchestrator.modules.operators.base import (
            warn_deprecated_operator_parameters_model_in_use,
        )

        if isinstance(value, int):
            # Notify the user that the parameters of my_operator
            # have been updated
            warn_deprecated_operator_parameters_model_in_use(
                affected_operator="my_operator",
                deprecated_from_operator_version="v2",
                removed_from_operator_version="v3",
                latest_format_documentation_url="https://example.com",
            )
            value = str(value)

        return value

When a model using ints will be loaded, the user will see the following warning:

WARN:   The parameters for the my_operator operator have been updated as of my_operator v3.
        They are being temporarily auto-upgraded to the latest version.
        This behavior will be removed with my_operator v3.
HINT:   Run ado upgrade operations to upgrade the stored operations.
        Update your operation YAML files to use the latest format: https://example.com

Nesting Operations¶

Operators can use other operators. For example your operator can create operations using other operators and consume the results. You access other operators via the relevant collection in orchestrator.modules.operators.collections. For example to use the RandomWalk operator

from orchestrator.modules.operators.collections import explore

@learn_operation(...)
def my_learning_operation(...):
    ...
    #Note: The name of the function called (here random_walk() ) is the operator name
    random_walk_output = explore.random_walk(...Args...)
    ...

Important

The name used to call an operator function is the name of the operator. This is the name given to the decorator name parameter and is the name shown by ado get operators

You access the data of the operation from the OperationOutput instance it returns. Any ado resources the nested operation creates will have been automatically added to the correct project by ado.

Creating Explore Operators¶

Explore operators sample and measure entities. In ado all explore operation run as distributed ray jobs with:

actuator ray actors for performing measurements
discovery space manager actor for storing and notifying about measurement results

This means explore operators need to be implemented differently to the others, in particular

The logic of your explore operator must be implemented as a ray actor (a class)
The explore operator functions must call this class i.e. you won't have any operator logic in the function

Explore operation functions¶

All explore operation functions follow this pattern:

@explore_operation(
    name="ray_tune",
    description=RayTune.description(),
    configuration_model=RayTuneConfiguration,
    configuration_model_default=RayTuneConfiguration(),
)
def ray_tune(
        discoverySpace: DiscoverySpace,
        operationInfo: FunctionOperationInfo = FunctionOperationInfo(),
        **kwargs: typing.Dict,
) -> OperationOutput:
    """
    Performs an optimization on a given discoverySpace

    """

    from orchestrator.core.operation.config import OperatorModuleConf
    from orchestrator.module.operator.orchestrate import explore_operation_function_wrapper


    ## This describes where the class the implements your explore operation is
    module = OperatorModuleConf(
        moduleName="ado_ray_tune.operator",  # The name of the package containing your explore actor
        moduleClass="RayTune",  # The name of your explore actor class
        moduleType=orchestrator.modules.module.ModuleTypeEnum.OPERATION,
    )

    # validate parameters
    RayTuneConfiguration.model_validate(kwargs)

    # Tell ado to execute your class
    output = explore_operation_function_wrapper(
        discovery_space=discoverySpace,
        module=module,
        parameters=kwargs,
        namespace=f"namespace-{str(uuid.uuid4())[:8]}",  #
        operation_info=operationInfo,  # Important: This is where you must pass the operationInfo parameter to ado
    )

    return output

Explore operator classes¶

TBA

Operator plugin packages¶

Operator plugin packages follow a standard python structure

$YOUR_REPO_NAME
│  └── $YOUR_PLUGIN_PACKAGE        # Your plugin
│      ├── __init__.py
│      └── ...
└── pyproject.toml

The key to making it an ado plugin is having a [project.entry-points."ado.operators"] section in the pyproject.toml e.g.

[project]
name = "ado-ray-tune" #Note: this is the distribution name of the Python package. Your ado operator(s) can have different ado identifier
version = "0.1.0"
dependencies = [
  #Dependencies
]

[project.entry-points."ado.operators"]
ado-ray-tune = "ado_ray_tune.operator_function" # The key is the distribution name of your Python package and the value is the Python module in your package containing your decorated operator function

This references the Python module (file) that contains your operator function

Note

You can define multiple operator functions in the referenced module.

Extending `ado` with new Operators

Knowledge required¶

ado operator functions¶

Operator Type¶

Operator function parameters¶

Describing your operation input parameters¶

Providing an example operation configuration¶

How your Operators input parameters model is stored and output¶

Operation function logic¶

Returning data from your operation¶

How to update your operator input parameters¶

Deprecating a field in your operator input parameters¶

Updating a field in your operator input parameters without deprecating it¶

Nesting Operations¶

Creating Explore Operators¶

Explore operation functions¶

Explore operator classes¶

Operator plugin packages¶

`ado` operator functions¶