Extending `ado` with new Operators
Info
A complete example operator is provided here. This example operator is functional, and useful, out-of-the-box. It can be used as the basis to create new operators. It references this document to help tie details here to implementation.
Developers can write their own operator plugins to add new operations that work on discoveryspaces
to ado
. Operator plugins are written in python and can live in their own repository.
The main part of writing an operator plugin, from an integration standpoint, is writing a python function that implements a specific interface. ado
will call this function to execute an operation with your operator. From this function you then call your operator logic (or in many cases it can just live in this function).
This page gives an overview of how to get started creating your own operator. After reading this page the best resource is to check our example operator.
Knowledge required¶
- Knowledge of python
- Knowledge of pydantic is useful, but not necessary
ado
operator functions¶
An operator function is a decorated python function with a specific signature. To execute your operator ado
will call your function and expect it to return output in a given way. Below is an example of such a decorated function. The next sections describe the decorator, its parameters, and the structure of the operation function itself.
from orchestrator.modules.operators.collections import
characterize_operation # Import the decorator from this module depending on the type of operation your operator performs
@characterize_operation(
name="my_operator", # The name of your operator.
description="Example operator", # What this operator does
configuration_model=MyOperatorOptions, # A pydantic model that describes your operators input parameters
configuration_model_default=MyOperatorOptions.default_parameters(), # An example of your operators input parameters
version="1.0", # Version of the operator
)
def detect_anomalous_series(
discoverySpace: DiscoverySpace,
operationInfo: typing.Optional[FunctionOperationInfo] = None,
**parameters,
) -> OperationOutput:
# Your operation logic - can also call other python modules etc.
...
return operationOutput
Operator Type¶
The first thing you need to do is decided what type of operator you are creating. The choices are explore, characterize, learn, modify, fuse, export, or compare. You then import the decorator for this operator type from orchestrator.modules.operators.collections
and use it to decorate your operator function.
For example, if your operator compares discoveryspaces
you would do
from orchestrator.modules.operators.collections import compare_operation
@compare_operation(...)
def my_comparison_operation():
The decorator parameters are the same for all operator/operation types.
Operator function parameters¶
All operator functions take one or more discoveryspaces
along with a dictionary containing the inputs for the operation.
If your operation type is explore
, characterize
, learn
or modify
, your function should have a parameter discoverySpace
i.e.
def detect_anomalous_series(
discoverySpace: DiscoverySpace,
operationInfo: typing.Optional[FunctionOperationInfo] = None,
**parameters,
) -> OperationOutput:
...
If it is fuse
or compare
your function should have a parameter discoverySpaces
which is a list of discoveryspaces
i.e.
def detect_anomalous_series(
discoverySpaces: list[DiscoverySpace],
operationInfo: typing.Optional[FunctionOperationInfo] = None,
**parameters,
) -> OperationOutput:
...
Operator functions also take an optional third parameter, operationInfo
, that holds information for ado
. You do not have to interact with the parameter unless you are writing an explore operator.
Describing your operation input parameters¶
From the previous section, the parameters
variable will contain the parameters values that should be used for a specific operation. However, how does ado
know what the valid input parameters are for your operator so the contents of this variable will make sense?
The answer is that the input parameters to your operator are described by a pydantic model that you give to the function decorator. Here's the example from the previous section with the relevant fields called out:
@characterize_operation(
name="my_operator",
description="Example operator",
configuration_model=MyOperatorOptions, # <- A pydantic model that describes your operators input parameters
configuration_model_default=MyOperatorOptions(), # <- An example of your operators input parameters
version="1.0",
)
Here MyOperatorOptions
is a pydantic model that describes your operators input parameters. The parameters
dictionary that is passed to your operation function will be a dump of this model. So the typical first step in the function is to create the model for your inputs
inputs = MyOperatorOptions.model_validate(parameters)
Providing an example operation configuration¶
The decorators configuration_model_default
parameter takes a example of your operators parameters. If your operator's parameter model has defaults for all fields then the simplest approach is to use those as the value of configuration_model_default
:
configuration_model_default=MyOperatorOptions(), # <- This will use the defaults specified for all fields of your operators parameters
How your Operators input parameters model is stored and output¶
When outputting the default options via ado template operator
, model_dump_json()
is used with no options.
When an operation
is created using your Operator, the parameters are stored in the metastore using model_dump_json()
with no options.
Operation function logic¶
We've covered how your operator will be called. However, where do you put your code?
If you are not creating an explore operator you can implement as you like e.g. within the operator function or in a class or function in a separate module called from the operator function.
If your operator type involves sampling and measuring entities e.g. it is an optimizer, your code has some additional packaging requirements which are discussed in explore operators.
Returning data from your operation¶
Note
Any ado
resources created will be stored in the context the operation was created in.
The operator function must return data using the orchestrators.core.operation.operation.OperationOutput
pydantic model.
class OperationOutput(pydantic.BaseModel):
metadata: typing.Dict = pydantic.Field(
default={},
description="Additional metadata about the operation. ",
)
resources: typing.List[orchestrator.core.resources.ADOResource] = pydantic.Field(
default=[],
description="Array of ADO resources generated by the operation",
)
exitStatus: OperationResourceStatus = pydantic.Field(
description="Exit status of the operation. Default to success if not applied",
)
The key fields to set are:
- resources: A list of
ado
resources your operation created. - existStatus: Indicates if the operation worked or not
Its expected that certain operation types return certain outputs:
- fuse, modify: Expected to return a new DiscoverySpaceResource and optionally a SampleStoreResource
- compare: Expected to return a new DataContainerResource
- characterize: Expected to return a new DataContainerResource
If you have non-ado resource data you want to return from your operation, for example pandas DataFrames, paths to files, text, lists etc. you can use ado
s datacontainer
resource.
The following code snippet shows returning a dataframe, a dictionary with some key:value pairs, and an URL:
tabular_data = TabularData.from_dataframe(df)
location = ResourceLocation.locationFromURL(someURL)
data_container = DataContainer(tabularData={"main_dataframe":tabular_data},
data={"important_dict":results_dict},
locationData={"important_location": location})
return OperationOutput(resources=[DataContainerResource(config=data_container)])
How to update your operator input parameters¶
During development, there will be times when you might need to update the input parameter model for your operator, adding, removing or modifying fields. In these cases, it's important not to break backwards compatibility (where possible) while making sure that users are aware of the changes to the model and do not rely indefinitely on the model being auto upgraded.
In ado, we recommend using Pydantic before validators coupled with the ado upgrade
command. At a high level, you should:
- Use a before validator to create a temporary upgrade path for your model.
- Enable a warning in this validator using the provided support functions (described below). This warning will inform users that an upgrade is needed. The support function will automatically print the command to upgrade stored model versions and remove the warning. It will also display a message indicating that auto-upgrade functionality will be removed in a future release.
- Remove the upgrade path in the specified future version.
Let's see a practical example. Consider this class as the input parameter class in my_operator
v1:
import pydantic
class MyOperatorOptions(pydantic.BaseModel):
my_parameter_name: int
And consider two cases:
- We want to deprecate a field.
- We want to apply changes to a field without deprecating it.
Deprecating a field in your operator input parameters¶
Let's imagine we want to change the name of the my_parameter_name
field to be my_improved_parameter_name
. The model for our operator v2 would then be:
import pydantic
class MyOperatorOptions(pydantic.BaseModel):
my_improved_parameter_name: int
To enable upgrading of the previous model versions when fields are being deprecated, we recommended using a Pydantic Before Model Validator. This allows the dictionary content of the model to be changed as appropriate before validation is applied. To ensure the users are aware of the change, we will also use the warn_deprecated_operator_parameters_model_in_use
method in the validator:
import pydantic
class MyOperatorOptions(pydantic.BaseModel):
my_improved_parameter_name: int
@pydantic.model_validator(mode="before")
@classmethod
def rename_my_parameter_name(cls, values: dict):
from orchestrator.modules.operators.base import (
warn_deprecated_operator_parameters_model_in_use,
)
old_key = "my_parameter_name"
new_key = "my_improved_parameter_name"
if old_key in values:
# Notify the user that the my_parameter_name
# field is deprecated
warn_deprecated_operator_parameters_model_in_use(
affected_operator="my_operator",
deprecated_from_operator_version="v2",
removed_from_operator_version="v3",
deprecated_fields=old_key,
latest_format_documentation_url="https://example.com",
)
# The user has set both the old
# and the new key - the new key
# takes precedence.
if new_key in values:
values.pop(old_key)
# Set the old value in the
# new field
else:
values[new_key] = values.pop(old_key)
return values
When a model with the old field will be loaded, the user will see the following warning:
WARN: The parameters for the my_operator operator have been updated as of my_operator v2.
They are being temporarily auto-upgraded to the latest version.
This behavior will be removed with my_operator v3.
HINT: Run ado upgrade operations to upgrade the stored operations.
Update your operation YAML files to use the latest format: https://example.com
Updating a field in your operator input parameters without deprecating it¶
Let's imagine we want to change the type of the my_parameter_name
field to be str
. The model for our operator v2 would then be:
import pydantic
class MyOperatorOptions(pydantic.BaseModel):
my_parameter_name: str
To enable upgrading of the previous model versions when fields are not being deprecated, we recommended using a Pydantic Before Field Validator. This allows the specific field to be changed as appropriate before validation is applied. To ensure the users are aware of the change, we will also use the warn_deprecated_operator_parameters_model_in_use
method in the validator:
Note
The method being called is the same as the one for warning about deprecated fields, but we omit the deprecated_fields
parameter.
import pydantic
class MyOperatorOptions(pydantic.BaseModel):
my_parameter_name: str
@pydantic.field_validator("my_parameter_name", mode="before")
@classmethod
def convert_my_parameter_name_to_string(cls, value: int | str):
from orchestrator.modules.operators.base import (
warn_deprecated_operator_parameters_model_in_use,
)
if isinstance(value, int):
# Notify the user that the parameters of my_operator
# have been updated
warn_deprecated_operator_parameters_model_in_use(
affected_operator="my_operator",
deprecated_from_operator_version="v2",
removed_from_operator_version="v3",
latest_format_documentation_url="https://example.com",
)
value = str(value)
return value
When a model using int
s will be loaded, the user will see the following warning:
WARN: The parameters for the my_operator operator have been updated as of my_operator v3.
They are being temporarily auto-upgraded to the latest version.
This behavior will be removed with my_operator v3.
HINT: Run ado upgrade operations to upgrade the stored operations.
Update your operation YAML files to use the latest format: https://example.com
Nesting Operations¶
Operators can use other operators. For example your operator can create operations using other operators and consume the results. You access other operators via the relevant collection in orchestrator.modules.operators.collections
. For example to use the RandomWalk operator
from orchestrator.modules.operators.collections import explore
@learn_operation(...)
def my_learning_operation(...):
...
#Note: The name of the function called (here random_walk() ) is the operator name
random_walk_output = explore.random_walk(...Args...)
...
Important
The name used to call an operator function is the name of the operator. This is the name given to the decorator name
parameter and is the name show by ado get operators
You access the data of the operation from the OperationOutput instance it returns. Any ado
resources the nested operation creates will have been automatically added to the correct project by ado
.
Creating Explore Operators¶
Explore operators sample and measure entities. In ado
all explore operation run as distributed ray jobs with:
- actuator ray actors for performing measurements
- discovery space manager actor for storing and notifying about measurement results
This means explore operators need to be implemented differently to the others, in particular
- The logic of your explore operator must be implemented as a ray actor (a class)
- The explore operator functions must call this class i.e. you won't have any operator logic in the function
Explore operation functions¶
All explore operation functions follow this pattern:
@explore_operation(
name="ray_tune",
description=RayTune.description(),
configuration_model=RayTuneConfiguration,
configuration_model_default=RayTuneConfiguration(),
)
def ray_tune(
discoverySpace: DiscoverySpace,
operationInfo: FunctionOperationInfo = FunctionOperationInfo(),
**kwargs: typing.Dict,
) -> OperationOutput:
"""
Performs an optimization on a given discoverySpace
"""
from orchestrator.core.operation.config import OperatorModuleConf
from orchestrator.module.operator.orchestrate import explore_operation_function_wrapper
## This describes where the class the implements your explore operation is
module = OperatorModuleConf(
moduleName="ado_ray_tune.operator", # The name of the package containing your explore actor
moduleClass="RayTune", # The name of your explore actor class
moduleType=orchestrator.modules.module.ModuleTypeEnum.OPERATION,
)
# validate parameters
RayTuneConfiguration.model_validate(kwargs)
# Tell ado to execute your class
output = explore_operation_function_wrapper(
discovery_space=discoverySpace,
module=module,
parameters=kwargs,
namespace=f"namespace-{str(uuid.uuid4())[:8]}", #
operation_info=operationInfo, # Important: This is where you must pass the operationInfo parameter to ado
)
return output
Explore operator classes¶
TBA
Operator plugin packages¶
Operator plugin packages follow a standard python structure
$YOUR_REPO_NAME
│ └── $YOUR_PLUGIN_PACKAGE # Your plugin
│ ├── __init__.py
│ └── ...
└── pyproject.toml
The key to making it an ado plugin is having a [project.entry-points."ado.operators"]
section in the pyproject.toml
e.g.
[project]
name = "ado-ray-tune" #Note this is the distribution name of the python package. Your ado operator(s) can have different ado identifier
version = "0.1.0"
dependencies = [
#Dependencies
]
[project.entry-points."ado.operators"]
ado-ray-tune = "ado_ray_tune.operator_function" # The key is the distribution name of your python package and the value is the python module in your package containing your decorated operator function
Note
You can define multiple operator functions in the referenced module.