Search based on a custom objective function
Note
This example shows how to create and use a custom objective function, an experiment which requires the output of another experiment, with ado
.
The scenario¶
Often experiments will not directly produce the value that you are interested in. For example, an experiment might measure the run time of an application, while the meaningful metric is the associated cost, which requires knowing information like the cost per hour of the GPUs used. Another common scenario involves aggregating data points from one or more experiments into a single value.
In this example we will install a custom objective function that calculates a cost for the application workload configurations used in the talking a random walk example. When the workload configuration space is explored using a random walk, both the wallClockRuntime
and the cost
, as defined by the custom function, will be measured.
Caution
The commands below assume you are in the directory examples/ml-multi-cloud
in the ado source repository. See here for how to get the source repository.
Pre-requisites¶
Install the ray_tune ado operator¶
If you haven't already installed the ray_tune operator, run (assumes you are in examples/ml-multi-cloud/
):
pip install ../../plugins/operators/ray_tune
ado get operators
ray_tune
like below Available operators by type:
OPERATOR TYPE
0 random_walk explore
1 ray_tune explore
Installing the custom experiment¶
The custom experiment is defined in a python package under custom_actuator_function/
. To install it run:
pip install custom_experiment/
then
ado get actuators --details
will output something like
2 custom_experiments CustomExperiments ml-multicloud-cost-v1.0 True
3 molecule-embeddings Embeddings calculate-morgan-fingerprint True
4 molformer-toxicity molformer-toxicity predict-toxicity True
5 mordred Mordred Descriptor mordred-descriptor-calculator True
6 st4sd ST4SD toxicity-prediction-opera True
You can see the custom experiment provided by the package, ml-multicloud-cost-v1.0 on the first line. Executing ado describe experiment ml-multicloud-cost-v1.0
outputs:
Identifier: custom_experiments.ml-multicloud-cost-v1.0
Required Inputs:
Constitutive Properties:
nodes
Domain:
Type: DISCRETE_VARIABLE_TYPE Interval: 1.0 Range: [0, 1000]
cpu_family
Domain:
Type: DISCRETE_VARIABLE_TYPE Values: [0, 1] Range: [0, 2]
Observed Properties:
op-benchmark_performance-wallClockRuntime
Outputs: ml-multicloud-cost-v1.0-total_cost
From this you can see the ml-multicloud-cost-v1.0
requires an observed property, i.e. a property measured by another experiment, as input. From the observed property identifier, the experiment is called benchmark_performance
and the property is wallClockRuntime
.
Create a discoveryspace that uses the custom experiment¶
First create a samplestore
with the ml-multi-cloud
example data following these instructions. If you have already completed the talking a random walk example, reuse the samplestore
you created there (use ado get samplestores
if you cannot recall the identifier).
To use the custom experiment you must add it in the experiments
list of a discoveryspace
. The acutauatorIdentifer
will be custom_experiments
and the experimentIdentifier
will be the name of your experiment. For this case the relevant section looks like:
experiments:
- experimentIdentifier: 'benchmark_performance'
actuatorIdentifier: 'replay'
- experimentIdentifier: 'ml-multicloud-cost-v1.0'
actuatorIdentifier: 'custom_experiments'
The complete discoveryspace
for this example is given in ml_multicloud_space_with_custom.yaml
To create it execute:
ado create space -f ml_multicloud_space_with_custom.yaml --set "sampleStoreIdentifier=$SAMPLE_STORE_IDENTIFIER"
Important
If an experiment takes the output of another experiment as input both experiments must be in the discoveryspace
. In the above example if the entry benchmark_performance
was omitted the ado create space
command would fail with:
SpaceInconsistencyError: MeasurementSpace does not contain an experiment measuring an observed property required by another experiment in the space
Note the created discoveryspace
resource identifier printed by this command for the next section. You can use this identifier in the following command to see a description of the space (replacing $DISCOVERY_SPACE_IDENTIFIER
):
ado describe space $DISCOVERY_SPACE_IDENTIFIER
This will output:
Identifier: space-d5c150-0b762f
Entity Space:
Number entities: 48
Categorical properties:
name values
0 provider [A, B, C]
Discrete properties:
name range interval values
0 cpu_family [0, 2] None [0, 1]
1 vcpu_size [0, 2] None [0, 1]
2 nodes [2, 6] None [2, 3, 4, 5]
Measurement Space:
experiment supported
0 replay.benchmark_performance True
1 custom_experiments.ml-multicloud-cost-v1.0 True
'replay.benchmark_performance'
Inputs:
parameter type value parameterized
0 cpu_family required None na
1 vcpu_size required None na
2 nodes required None na
3 provider required None na
Outputs:
target property
0 wallClockRuntime
1 status 'custom_experiments.ml-multicloud-cost-v1.0'
Inputs:
parameter type value parameterized
0 nodes required None na
1 cpu_family required None na
2 benchmark_performance-wallClockRuntime required None na
Outputs:
target property
0 total_cost
Sample store identifier: '0b762f'
Exploring the discoveryspace
¶
To run a randomwalk
operation on the new space, execute ((replacing $DISCOVERY_SPACE_IDENTIFIER
with the identifier of the space you created):
ado create operation -f randomwalk_ml_multicloud_operation.yaml --set "spaces[0]=$DISCOVERY_SPACE_IDENTIFIER"
This produces an output similar to that described in the talking a random walk example and will exit printing the operation identifier. However, in this case there is additional information related to the dependent experiment.
When it completes, execute the ado show entities operation
command to see the results produced:
ado show entities operation $OPERATION_IDENTIFIER
You will see a table similar to the following - note the extra column for the new cost function:
result_index identifier benchmark_performance-wallClockRuntime benchmark_performance-status ml-multicloud-cost-v1.0-total_cost reason valid
request_index
0 0 A_f0.0-c0.0-n3 [221.5101969242096, 216.394127368927] [ok, ok] [1.8459183077017467, 1.8032843947410584] True
1 0 C_f0.0-c0.0-n5 [150.9471504688263, 138.0605161190033] [ok, ok] [2.0964882009559207, 1.9175071683194902] True
2 0 A_f0.0-c0.0-n5 [106.0709307193756, 130.30512285232544] [ok, ok] [1.473207371102439, 1.8097933729489646] True
3 0 C_f1.0-c1.0-n5 [100.97977471351624, 92.17141437530518] [ok, ok] [2.8049937420421176, 2.560317065980699] True
4 0 C_f1.0-c0.0-n5 [136.3071050643921, 135.47050046920776] [ok, ok] [3.786308474010892, 3.763069457477993] True
5 0 A_f0.0-c1.0-n2 272.997822 ok 1.516655 True
6 0 C_f1.0-c1.0-n4 114.014369 ok 2.533653 True
7 0 B_f1.0-c1.0-n2 298.819305 ok 3.320214 True
8 0 C_f0.0-c0.0-n4 188.090878 ok 2.089899 True
9 0 A_f1.0-c1.0-n2 291.904456 ok 3.243383 True
10 0 B_f0.0-c0.0-n3 [184.44801592826843, 153.51639366149902, 176.28814435005188] [ok, ok, ok] [1.537066799402237, 1.2793032805124918, 1.4690678695837658] True
11 0 B_f0.0-c1.0-n2 [166.74843192100525, 184.935049533844] [ok, ok] [0.9263801773389181, 1.0274169418546888] True
12 0 C_f0.0-c0.0-n2 415.829285 ok 2.310163 True
13 0 provider.B-cpu_family.1-vcpu_size.1-nodes.3 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
14 0 A_f0.0-c1.0-n4 106.670121 ok 1.185224 True
15 0 B_f1.0-c0.0-n3 [273.7120273113251, 220.19828414916992] [ok, ok] [4.561867121855418, 3.669971402486165] True
16 0 B_f1.0-c0.0-n2 346.070996 ok 3.845233 True
17 0 B_f0.0-c0.0-n5 [113.88505148887634, 103.90595746040344, 112.7056987285614] [ok, ok, ok] [1.5817368262343936, 1.443138298061159, 1.5653569267855751] True
18 0 C_f1.0-c1.0-n2 363.285671 ok 4.036507 True
19 0 C_f0.0-c1.0-n5 [95.86326050758362, 85.67946743965149] [ok, ok] [1.331434173716439, 1.1899926033284929] True
20 0 C_f1.0-c1.0-n3 [154.9813470840454, 168.34859228134155] [ok, ok] [2.583022451400757, 2.8058098713556925] True
21 0 B_f0.0-c0.0-n2 [228.14362454414368, 225.1791422367096] [ok, ok] [1.267464580800798, 1.2509952346483866] True
22 0 provider.B-cpu_family.0-vcpu_size.1-nodes.4 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
23 0 C_f0.0-c1.0-n4 121.424925 ok 1.349166 True
24 0 A_f0.0-c0.0-n2 335.208518 ok 1.86227 True
25 0 A_f1.0-c1.0-n4 116.314171 ok 2.584759 True
26 0 C_f0.0-c1.0-n2 309.842324 ok 1.721346 True
27 0 A_f1.0-c1.0-n3 [155.02856159210205, 151.58562421798706] [ok, ok] [2.5838093598683676, 2.526427070299784] True
28 0 C_f1.0-c0.0-n2 463.396539 ok 5.14885 True
29 0 A_f0.0-c0.0-n4 145.129484 ok 1.61255 True
30 0 A_f1.0-c0.0-n3 [206.74496150016785, 236.1715066432953] [ok, ok] [3.4457493583361307, 3.936191777388255] True
31 0 provider.B-cpu_family.0-vcpu_size.1-nodes.3 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
32 0 C_f0.0-c0.0-n3 [269.0906641483307, 240.07358503341675] [ok, ok] [2.242422201236089, 2.000613208611806] True
33 0 B_f0.0-c0.0-n4 [113.87676978111269, 132.5415120124817] [ok, ok] [1.265297442012363, 1.4726834668053521] True
34 0 A_f1.0-c0.0-n4 158.706395 ok 3.526809 True
35 0 C_f0.0-c1.0-n3 [168.9163637161255, 174.0335624217987] [ok, ok] [1.4076363643010457, 1.4502796868483225] True
36 0 provider.B-cpu_family.1-vcpu_size.1-nodes.5 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
37 0 B_f1.0-c0.0-n4 [202.48239731788635, 193.55997109413147] [ok, ok] [4.499608829286363, 4.301332690980699] True
38 0 C_f1.0-c0.0-n4 177.723598 ok 3.949413 True
39 0 B_f1.0-c0.0-n5 [168.79178500175476, 141.99024295806885] [ok, ok] [4.688660694493188, 3.944173415501912] True
40 0 provider.B-cpu_family.0-vcpu_size.1-nodes.5 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
41 0 C_f1.0-c0.0-n3 [244.33887457847595, 598.8834657669067] [ok, Timed out.] [4.0723145763079325, 9.981391096115113] True
42 0 A_f1.0-c1.0-n5 [96.8471610546112, 105.63729166984558] [ok, ok] [2.6901989181836448, 2.934369213051266] True
43 0 A_f0.0-c1.0-n3 [170.15659737586975, 168.36590766906738] [ok, ok] [1.4179716447989146, 1.4030492305755615] True
44 0 provider.B-cpu_family.1-vcpu_size.1-nodes.4 Externally defined experiments cannot be applied to entities: replay.benchmark_performance. False
45 0 A_f1.0-c0.0-n2 378.31657 ok 4.203517 True
46 0 A_f0.0-c1.0-n5 [86.23016095161438, 84.45346999168396] [ok, ok] [1.1976411243279776, 1.1729648609956105] True
47 0 A_f1.0-c0.0-n5 [117.94136571884157, 135.91092538833618] [ok, ok] [3.276149047745599, 3.775303483009338] True
Explore Further¶
- perform an optimization instead of a random walk: See the search a space with an optimizer example.
- modify the objective function: Try modifying the cost function and creating a new space - be careful to change the name of the experiment!
- create a custom experiment: Explore the documentation for writing your own custom experiment
- break the discoveryspace: See what happens if you try to create the
discoveryspace
without the experiment that provides input to the cost function. - examine the requests: Run
ado show requests operation
to see what is replayed (benchmark_performance
) and what is calculated (ml_multicloud_cost-v1.0
)
Takeaways¶
- dependent experiments:
ado
allows you to define experiments which consume the output of other experiments.- There is no limit to the depth of the chain of dependent experiments
- Dependent experiments are executed when the required inputs are available.
- custom experiments: You can add your own python functions as experiments using
ado
's custom experiments feature. - uniform usage pattern: How you use
ado
to define spaces or perform operations does not change if you use custom or dependent experiments
What's next¶
-
Search using an optimizer
Try the Search a space with an optimizer example to see how you can use RayTune in combination with custom experiments, via
ado
. -
Discovering important entity space dimensions
Try the Identify the important dimensions of a space example to see how you can use
ado
to discover which entity space dimensions most influence a target metric.