Skip to content

The Ray Tune Operator

Overview

What does the ray_tune operator do?

The ray_tune operator enables running optimization algorithms on a discoveryspace. It uses the RayTune framework, and most of the capabilities of RayTune can be access via the operator without the need to write python code.

ray_tune is an explore operator.

When should you use the ray_tune operator?

Use the ray_tune operator when you want to:

  • find the maximum or minimum value of an observed property/target property in a discoveryspace
  • efficiently sample a discoveryspace with respect to an observed property/target property i.e. sample to understand the distribution of that metric in the space

The ray_tune operator supports memoization: if it samples the same entity twice, and that entity has already had the measurement space applied, it will replay the already measured values (by default).

Differences in using the ray_tune operator and RayTune directly

Using RayTune via the ado ray_tune operator brings the following advantages:

  • Distributed storage and sharing of optimization runs and their results
  • Automatic recording of provenance
  • Transparent and distributed memoization
  • Fully declarative interface, no need for programming

However, there are a few drawbacks:

  • Some customizations that requires programming are not available
  • The current ado generic actuator model is not compatible with some RayTune features which assume interaction with RayTrain.

What happens if I apply multiple ray_tune operations to a space?

If you apply multiple ray_tune operations you just get multiple optimization runs of the different lengths and types you have requested. This is the same behaviour as applying RandomWalk multiple time to a space and is explained in more detail in the RandomWalk documentation

Available Optimizers

The optimizers available depend on the RayTune version used. At time of writing they are:

  • ax
  • hyperopt
  • bayesopt
  • bohb
  • nevergrad
  • optuna
  • zoopt
  • hebo

In addition ado adds a additional optimizer called lhu_sampler.

Important

The above names are used to specify the optimizer to use in an operation.

The names are defined by RayTune: check the raytune docs for the current list of optimizer names. The list is also defined by in the variable ray.tune.search.SEARCH_ALG_IMPORT

Warning

RayTune also defines python classes for each optimizer. These class names are NOT the same as the "optimizer names" it defines and cannot be used with ado

Installing an optimizer

Each optimizer is its own python package and RayTune does not install them by default. ado installs ax but if you want to use any of the others you must install the corresponding python package. For example to use nevergrad run

pip install nevergrad

Setting the parameters of a ray_tune operation

When configuring a ray_tune operation there are three groups of parameters to consider:

For example, the default parameters and values for a ray_tune operation are:

orchestratorConfig:
  failed_metric_value: None # This will be used for the value of "metric' for any entities where it could not be measured (for any reason)
  result_dump: none # If specified the best result found will be written to this file
  single_measurement_per_property: true # If true memoization is used. If false already measured entities will be re-measured. 
runtimeConfig:
  stop: None # A list of Stoppers or None. See below for stoppers
tuneConfig:
  metric: wallclock_time. # The target property identifier to optimize w.r.t 
  mode: min # Whether to search for min or max of the target property
  num_samples: 1 # The number of samples to draw
  search_alg:
    name: ax # The name of the optimization algorithm to use
    params: {} # The parameters for the optimizer

The following sections describes each of these parameter sets in more detail. As you go through these sections it is worth referring to the comprehensive example that demonstrate putting all the pieces together and how they interact.

Info

You can get a default RayTune operation template and the schema of its parameters by running ado template operation --operator-name ray_tune --include-schema The information output by this command should always be preferred over the information presented here if there is an inconsistency.

Trials versus Samples

In RayTune samples means the points to measure, and trials are measurements of those points. They are related to ado concepts of entities (samples) and measurements (trials).

Orchestrator Config

The orchestratorConfig section currently supports the following parameters, which are all optional:

  • failed_metric_value (default None)
    • This will be used for the value of "metric' for any entities where it could not be measured (for any reason)
  • `result_dump (default None)
    • If specified the best result found will be written to this file
  • single_measurement_per_property (default true)
    • If true memoization is used. If false already measured entities will be re-measured.

Tune Config

The tuneConfig section supports many of the parameters of the ray.tune.TuneConfig class.

Supported parameters:

  • metric (required)
  • mode (required)
    • min or max: Whether to search for min or max of the target property
  • search_alg (required)
    • Note: This must be an optimizer name c.f. in RayTune it would be an optimizer instance
  • num_samples (defaults to 1)
    • Note: The exact interpretation of num_samples is optimizer dependent e.g. some do not count "warm-up" samples as part of this.
  • max_concurrent_trials
    • Note: this can also be controlled via most optimizers parameters. If not set, the default value depends on the optimizer
  • time_budget_s: How many second to run the optimizer for

Unsupported parameters:

  • scheduler - Coming soon
  • reuse_actors - not relevant
  • trial_name_creator - not relevant
  • trail_dirname_creator - not relevant

Optimizer Parameters (search_alg.params)

The optimizer parameters are given as a dictionary to the tuneConfig.search_alg.params field.

The parameters available for a given optimizer are detailed in the ray tune documentation for that optimizer. Almost all parameters that are listed in the RayTune docs for creating an instance of an optimizer can be specified in tuneConfig.search_alg.params. However, there are a few that should not be set. This is discussed in parameters to omit.

Info

The dictionary value you set for tuneConfig.search_alg.params will be used to initialise the optimizer in the standard python manner:

optimizerInstance = optimizerClass(**params)

Info

Currently you cannot obtain a given optimizers parameters are via ado. To understand the configuration possibilities for an optimizer you must check the RayTune documentation.

Parameters to omit

All RayTune optimizers have the parameters space, metric and mode which should be omitted from the tuneConfig.search_alg.params dictionary.

The space parameter will be filled by ado based on the discoveryspace the ray_tune operation is being applied to.

The metric and mode parameters are provided via the tuneConfig fields (see above).

Common parameters

All RayTune optimizers support a parameter points_to_evaluate which is a list of the initial entities to test. Each entity is described by a dictionary of "constitutive property id/"value" pairs.

Warning

The constitutive property identifiers and values must be compatible with the discoveryspace the operation is being applied to. If they are not an XXXXX exception will be raised when the operation starts.

For example:

- model_name: granite-3b
  number_gpus: 4
  tokens_per_sample: 2048
  gpu_model: A100-SXM4-80GB

Optimizer specific parameters

In addition to points_to_evaluate each optimizer has its own parameters. For example, some optimizers allow evaluating multiple points at same time, while others have a warm-up period of random sampling that can set. Check the RayTune documentation for each optimizer to understand the available options.

Info

ado does not perform any validation of the optimizer parameters. Validation will be performed by RayTune when creating the optimizer class

nevergrad parameters

The nevergrad search algorithm has a required parameter optimizer that programmatically is set to a nevergrad optimizer class or instance. In ado set this value to the string name of the optimizer. This valid strings are the keys of the nevergrad registry, which you can see with:

import nevergrad
print(list(nevergrad.optimizers.registry.keys()))

Run Config

The runConfig section supports many of the parameters of the ray.tune.RunConfig class. All are optional although the above template shows stop as this is the most relevant

Many of the run config parameters are related to storing the RayTune runs on disk. Since ado automatically stores the results and operation details in samplestore and metastore you likely do not need to set any of these values.

Key Supported Parameters:

  • stop - see Stoppers
  • storage_path:
    • ado defaults this to "/tmp/ray_results" as this directory is writable in the default ado image used in ray clusters.
    • If you change this path ensure it is writable

Other supported parameters:

  • name
  • storage_filesystem
  • verbose

Unsupported parameters:

These parameters are mostly for use with RayTrain (using tune as train hyperparameter search). As such they are specific to the use case of model training.

  • failure_config
  • checkpoint_config - for RayTrain, not relevant here
  • sync_config

Stoppers

Stoppers define conditions for stopping an optimization. An example includes stopping when no new min/max have been found after N samples.

An optimization run can specify any number of stoppers and the optimization will stop whenever the condition of any stopper in the list is met.

You specify the stoppers to use as a list to the runConfig.stop field. Each stopper in the list is defined using the following yaml.

name:  #The name of the stopper class
positionalParams: #The positional params of the stopper: a list of strings/numbers
keywordParams: #The keyword params of the stopper: a dictionary of key/value pairs

Info

Its recommend to use keywordParams even for positional parameters

The following sections describe the available stoppers.

RayTune stoppers

RayTune provides a number of Stoppers. Some of these, MaximumIterationStopper, ExperimentPlateauStopper and TrialPlateauStopper, are for early measurement stopping when using RayTrain and have no effect when used via ado.

Of the remaining stoppers TimeoutStopper is automatically used if you specify tuneConfig.time_budget_s so it does not have to specified independently. Similarly, CombinedStopper is automatically used if you specify more than one stopper in the list given to runConfig.stop.

Finally, FunctionStopper, which allows passing a custom python function as a stopper, cannot currently be used with ado.

ado stoppers

ado provides four inbuilt stoppers:

  • SimpleStopper: Stops if no improvement in the target metric after N steps
  • GrowthStopper: Stops when the improvement in the target metric is less than a threshold for N steps
  • MaxSamplesStopper: Stops when a certain number of samples have been drawn. Is less ambiguous than tuneConfig.num_samples
  • InformationGainStopper: Stops when samples are no longer providing significant additional information on how the constitutive properties of the entity space are related to the target property.

Each of these are described in more detail, along with their parameters, here.

Example

This example shows using SimpleStopper to stop an optimization when the metric has not improved after 10 trials. It will allow 5 trials to be performed before checking if it should stop the optimization, and will consider trials returning nan for the metric towards the 10 trial budget. It combines this with MaxSamplesStopper to stop when 50 samples have been drawn.

stop:
  - name: SimpleStopper 
    keywordParams: 
      metric: wallclock_time
      min_trials: 5
      buffer_states: 10
      count_nan: True
  - name: MaxSamplesStopper 
    keywordParams: 
      max_samples: 50

Early Measurement Stopping

Some RayTune stoppers directly support the case where each trial (measurement) is a RayTrain job. These stoppers can inspect the progress of an individual RayTrain jobs i.e. intermediate metric values, to determine if a trial should be stopped. This ability assumes particular behaviour of such a job e.g. the trial is generating a timeseries of the metric being optimized.

Currently, ados actuator model does not assume there are intermediate values of a metric being measured by an experiment, or provide a way to expose them. Instead, we leave these domain specific details to the actuator. For example, in ados current model the actuator can implement and/or expose early-stopping if it is possible.

Example operation YAML

Here is an example ray_tune operation YAML for finding the workload configuration with the fastest throughput for fine-tuning performance using the sfttrainer actuator:

  • using the Ax optimizer with its parameter_constraint optimizer specific parameter
  • the GrowthStopper to stop if no improvement found after 10 steps, where improvement means a configuration that is faster by more than 20 tokens per second
  • the MaxSamplesStopper to stop once 50 configurations have been searched
  • a time budget of 2 hours
  • specifying initial point to sample
orchestratorConfig:
  failed_metric_value: None # This will be used for the value of "metric' for any entities where it could not be measured (for any reason)
runtimeConfig:
  stop: 
  - name: GrowthStopper 
    keywordParams: 
      mode:  max 
      metric: dataset_tokens_per_second
      growth_threshold: 20 #if the change is less than 20 tokens per second consider the optimization not improving
      grace_trials: 10 
  - name: MaxSamplesStopper 
    keywordParams: 
      max_samples: 50 
tuneConfig:
  metric: dataset_tokens_per_second 
  mode: min 
  num_samples: 50 # The number of samples to draw. We also use max samples stopper in case Ax has a different interpretation of max samples
  time_budget_s: 7200
  search_alg:
    name: ax # The name of the optimization algorithm to use
    params: 
      points_to_evaluate:
      - model_name: granite-3b
        number_gpus: 4
        model_max_length: 2048
        gpu_model: A100-SXM4-80GB
      parameter_constraints:
      - "batch_size >= number_gpus" # Don't sample points where batch_size < number_gpus as these are invalid

ray_tune operation output

Seeing the optimal configuration found

A successful ray_tune operation will create a datacontainer resource, containing information from RayTune on the best configuration found.

To get the id of the datacontainer related to a ray_tune operation resource with id $OPERATION_ID use:

ado show related operation $OPERATION_ID
This will output something like:
datacontainer
  - datacontainer-d6a6501b
discoveryspace
  - space-047b6a-f60613

To see the best point found (and in general the contents of the datacontainer) use the describe CLI command:

ado describe datacontainer $DATACONTAINER_ID
For a datacontainer created by a ray_tune operation, an example output is:
Identifier: datacontainer-d6a6501b
Basic Data:
  
  Label: best_result
  
  {'config': {'x2': -1.1192905253425014,
    'x1': 2.081208150586974,
    'x0': 0.5621591414422049},
   'metrics': {'function_value': 20.788056393697595,
    'timestamp': 1756804287,
    'checkpoint_dir_name': None,
    'done': True,
    'training_iteration': 1,
    'trial_id': '7a7153ed',
    'date': '2025-09-02_10-11-27',
    'time_this_iter_s': 1.0576610565185547,
    'time_total_s': 1.0576610565185547,
    'pid': 52036,
    'hostname': 'Michaels-MacBook-Pro-2.local',
    'node_ip': '127.0.0.1',
    'config': {'x2': -1.1192905253425014,
     'x1': 2.081208150586974,
     'x0': 0.5621591414422049},
    'time_since_restore': 1.0576610565185547,
    'iterations_since_restore': 1,
    'experiment_tag': '40_x0=0.5622,x1=2.0812,x2=-1.1193'},
   'error': None}
We can see here that the point found is {'x2': -1.1192905253425014, 'x1': 2.081208150586974, 'x0': 0.5621591414422049} where function_value was ~20.8.

Optimization path

To see all the configurations (entities) visited during an optimization operation $OPERATION_ID run

ado show entities operation $OPERATION_ID

Note

This command also works during an operation. It shows up to the most recent measured entity.

ado additions to RayTune

ado adds one optimizer and a selection of stoppers to those offered by RayTune

Latin Hypercube Sampler

The lhu_sampler samples a discoveryspace using latin hypercube sampling. This is a method for "almost" randomly sampling a space while ensuring the samples are evenly spaced across the space. Using the lhu_sampler you can potentially get more information about the variance of a metric across a space with fewer samples than fully random sampling. It also reduces chances of not exploring dimensions in high-dimensional spaces when sampling budget is limited.

The lhu_sampler pairs well with the InformationGainStopper

Configuration the lhu_sampler follows the same pattern as those for other optimizer parameters. For the lhu_sampler, there is only one optional parameter, points_to_evaluate.

name: 'lhu_sampler'
params:
  points_to_evaluate:
  - model_name: granite-3b
    number_gpus: 4
    tokens_per_sample: 2048
    gpu_model: A100-SXM4-80GB
  - model_name: granite-3b
    number_gpus: 2
    tokens_per_sample: 2048
    gpu_model: A100-SXM4-80GB

MaxSampleStopper

Stops an optimization after N samples/trials.

The following YAML describes the stoppers parameters. Parameters without values are required.

name: MaxSamplesStopper 
keywordParams: 
  max_samples: 10 # Will stop the optimization when this number of samples have been measured. Required

SimpleStopper

Stops an optimization when the metric has not improved after N steps. Note, its excepted mode and metric should match the corresponding tuneConfig parameters but this is not checked.

The following YAML describes the stoppers parameters. Parameters without values are required.

name: SimpleStopper 
keywordParams: 
  mode:  # `min` or `max`: Whether to search for min or max of the target property/metric. Required
  metric:  # The target property being optimized. Required.
  min_trials: 5 # The number of trials to perform (samples to take) before applying any stopping criteria
  buffer_states: 2 # The number of samples/optimization steps to wait before declaring no improvement. 
  stop_on_repeat: True # If True, the stopper will stop the optimization if it sees the same sample twice. 
  count_nan: True # If True, samples measuring 'nan' count towards the steps to wait before declaring no improvement.

Important

buffer_states does not reset if the metric is observed to improve in a step. That is, it is the total number of samples allowed that do not improve on the best found sample.

GrowthStopper

Stops an optimization once it sees the metric improvement rate is below a threshold. This differs to SimpleStopper which as long as there is any improvement won't stop the optimization.

  • if the metric value gets worse on a step, i.e. negative improvement, it is considered to be below the threshold.
  • Samples whose metric value is nan are always included

The following YAML describes the stoppers parameters. Parameters without values are required.

name: GrowthStopper 
keywordParams: 
  mode:  # `min` or `max`: Whether to search for min or max of the target property/metric. Required
  metric:  # The target property being optimized. Required.
  growth_threshold: 1.0 # If the difference in two samples is less than this threshold the optimization is considered to be not improving
  grace_trials: 2 # The number of samples/optimization steps to wait before declaring the metric is not improving. Same as buffer_states for SimpleStopper. 

Important

grace_trials does not reset if the metric is observed to "grow" in a step after it was observed to not "grow" That is, it is the total number of samples allowed where the improvement in the metric is less than threshold.

InformationGainStopper

This stopper criteria is based on Mutual Information, which here is used to measures how related the constitutive properties (dimensions of the entity space) are to the metric being optimized. At a high-level this stopper stops when it observes the mutual information is converging.

This stopper considers two ways that the mutual information can change:

  1. mutual information value: If the value is changing by less than a threshold it is considered "converging"
  2. properties contribution to the mutual information: This can be measured in two ways:
    1. Change in the ranking of which constitutive properties contribute the most to the mutual information with the metric. If the ranking is not changing the mutual information is considered to be converging.
    2. Change in the set of constitutive properties which contribute most to the mutual information with metric. If the set of propertiers is not changing the mutual information is considered to be converging.

The stopper will only stop when it sees both the mutual information value and the properties that contribute to it converging.

This stopper will perform at least 2x(number of constitutive properties in entity space) samples before applying its stopping criteria.

The following YAML describes the stoppers parameters. Parameters without values are required.

name: InformationGainStopper 
keywordParams: 
  mi_diff_limit: # If the mutual information increase on addition of the latest sample is less than this value, it counts as "converging".
  samples_below_limit: # # The number of samples/optimization steps to wait before declaring the mutual information is not increasing. Similar to buffer_states for SimpleStopper. 
  consider_pareto_front_convergence: # If True the stopper considers convergence of the set of important properties (2.2 above). If False it considers the ranking (2.1 above)

Important

Both the mutual information value and the property ranking/set must stay unchanged for samples_below_limit for the stopping criteria to be reached

What's next