Measure throughput of finetuning locally¶
Note
This example illustrates:
-  Set up a local environment for running finetuning performance benchmarks with SFTTrainer 
-  Benchmarking a set of finetuning configurations for a small model using a local context and only the CPU 
The scenario¶
When you run a finetuning workload, you can choose values for parameters like the model name, batch size, and number of GPUs. To understand how these choices affect performance, a common strategy is to measure changes in system behavior by exploring the workload parameter space.
This approach applies to many machine learning workloads where performance depends on configuration.
In this example, ado is used to explore the parameter space for finetuning a small language model on your laptop without using GPUs.
To explore this space, you will:
- define the parameters to test - such as the batch size and the model max length
- define what to test them with - In this case we will use SFTTrainer's finetune_full_benchmark-v1.0.0experiment
- explore the parameter space - the sampling method
Here, you'll use the finetune_full_benchmark-v1.0.0 experiment that the SFTTrainer actuator provides to run four measurements on your laptop without using GPUs. Each measurement records metrics like dataset tokens per second and stores the results in ado's database.
Pre-requisites¶
Set Active Context¶
You should use the local context for the example.
ado context local
Install the SFTTrainer actuator¶
  pip install ado-sfttrainer`
Info
This step assumes you are in the root directory of the ado source repository.
If you haven't already installed the SFTTrainer actuator, run (assumes you are in the root directory of ado):
pip install plugins/actuators/sfttrainer
then executing
ado get actuators
should show an entry for SFTTrainer like below
           ACTUATOR ID
0   custom_experiments
1                 mock
2               replay
3           SFTTrainer
Configure the parameters of the SFTTrainer actuator¶
SFTTrainer includes parameters that control its behavior. For example, it pushes any training metrics it collects, like system profiling metadata, to an AIM server by default. It also features parameters that define important paths, such as the location of the Hugging Face cache and the directory where the actuator expects to find files like the test dataset.
In this section you will configure the actuator for running experiments locally and storing data under the path /tmp/ado-sft-trainer-hello-world/.
Create a file called actuator_configuration.yaml with the following contents:
actuatorIdentifier: SFTTrainer
parameters:
  match_exact_dependencies: False
  data_directory: /tmp/ado-sft-trainer-hello-world/data-files
  cache: /tmp/ado-sft-trainer-hello-world/cache
  hf_home: ~/.cache/huggingface
To create the actuatorconfiguration resource run:
ado create actuatorconfiguration -f actuator_configuration.yaml
The command will print the ID of the resource. Make a note of it, you will need it in a later step.
See the full list of parameters you can set in an actuatorconfiguration resource for the SFTTrainer actuator in its reference docs.
Environment setup¶
Create the Dataset¶
The finetuning measurements require a synthetic dataset which is a file named news-tokens-16384plus-entries-4096.jsonl in the directory /tmp/ado-sft-trainer-hello-world/data-files which is under the path specified by the data_directory actuator parameter.
You can reuse this Dataset for any future measurements you run on this device.
To generate the dataset run the following command:
sfttrainer_generate_dataset_text \
  -o /tmp/ado-sft-trainer-hello-world/data-files/news-tokens-16384plus-entries-4096.jsonl
Download model weights¶
Next download the weights of the model we use in this example (smollm2-135m) in the appropriate path under the directory specified by the hf_home parameter of the SFTTrainer actuator.
First, store the below YAML to the file models.yaml inside your working directory:
smollm2-135m:
  Vanilla: HuggingFaceTB/SmolLM2-135M
Then, run the command:
sfttrainer_download_hf_weights -i models.yaml -o ~/.cache/huggingface
Run the example¶
This section explains the process of using ado to define and launch a set of finetuning measurements which store their results in the local context.
Define the finetuning workload configurations to test and how to test them¶
A discoveryspace defines what you want to measure (Entity Space) and how you want to measure it (Measurement Space). It also links to the samplestore which is where Entities and their measured properties are stored in.
In this example, we create a discoveryspace that runs the finetune_full_benchmark-v1.0.0 experiment to finetune the smollm2-135m model without using any GPUs.
The entitySpace defined below includes four dimensions:
- model_nameand- number_gpuseach contain a single value.
- model_max_lengthand- batch_sizeeach contain two values.
The total number of entities in the entitySpace is the number of unique combinations of values across all dimensions. In this case, the configuration contains 4 entities.
You can find the complete list of the entity space properties in the documentation of the finetune_full_benchmark-v1.0.0 experiment.
To create the Discovery Space:
- Create the file space.yamlwith the following content
# if you do not have a Sample Store we provide a command-line that will create one for you
sampleStoreIdentifier: Replace this with the identifier of your sample store
experiments:
  - experimentIdentifier: finetune_full_benchmark-v1.0.0
    actuatorIdentifier: SFTTrainer
    parameterization:
      - property:
          identifier: fms_hf_tuning_version
        value: "2.8.2"
      - property:
          identifier: stop_after_seconds
        value: 30
      - property:
          identifier: flash_attn
        value: False
entitySpace:
  - identifier: "model_name"
    propertyDomain:
      values: ["smollm2-135m"]
  - identifier: "number_gpus"
    propertyDomain:
      values: [0]
  - identifier: "model_max_length"
    propertyDomain:
      values: [512, 1024]
  - identifier: "batch_size"
    propertyDomain:
      values: [1, 2]
- Create the space:
-  If you have an samplestoreID, run:ado create space -f space.yaml --set "sampleStoreIdentifier=$SAMPLE_STORE_IDENTIFIER"
-  If you do not have a samplestorethen runado create space -f space.yaml --new-sample-store
This will print a discoveryspace ID (e.g., space-ea937f-831dba). Make a note of this ID, you'll need it in the next step.
Create a random walk operation to explore the space¶
 - Create the file operation.yamlwith the following content:
spaces:
  - The identifier of the DiscoverySpace resource
actuatorConfigurationIdentifiers:
  - The identifier of the Actuator Configuration resource
operation:
  module:
    operatorName: "random_walk"
    operationType: "search"
  parameters:
    numberEntities: all
    singleMeasurement: True
    mode: sequential
    samplerType: generator
- Replace the placeholders with your discoveryspaceID andactuatorconfigurationID and save it in a file with the nameoperation.yaml
- Create the operation
ado create operation -f operation.yaml
The operation will execute the measurements (i.e. apply the experiment finetune_full_benchmark-v1.0.0 on the 4 entities) based on the definition of your discoveryspace. The remaining three measurements will reuse both the cached model weights and the cached data, making them faster to complete.
Info
Each measurement takes about two minutes to complete, with a total of four measurements. Ray will also take a couple of minutes to build the Ray runtime environment on participating ray workers so expect the operation to take O(10) minutes to complete.
Examine the results of the exploration¶
After the operation completes, you can download the results of your measurements:
ado show entities --output-format csv --property-format=target space $yourDiscoverySpaceID
The command will generate a CSV file. Open it to explore the data that your operation has collected!
It should look similar to this:
,identifier,generatorid,experiment_id,model_name,number_gpus,model_max_length,batch_size,gpu_compute_utilization_min,gpu_compute_utilization_avg,gpu_compute_utilization_max,gpu_memory_utilization_min,gpu_memory_utilization_avg,gpu_memory_utilization_max,gpu_power_watts_min,gpu_power_watts_avg,gpu_power_watts_max,gpu_power_percent_min,gpu_power_percent_avg,gpu_power_percent_max,cpu_compute_utilization,cpu_memory_utilization,train_runtime,train_samples_per_second,train_steps_per_second,dataset_tokens_per_second,dataset_tokens_per_second_per_gpu,is_valid
0,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,79.55,49.11366699999999,30.4385,134.566,33.642,2624.0452059069926,2624.0452059069926,1
1,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,76.3,49.163925750000004,30.095,136.103,17.013,4355.274962618375,4355.274962618375,1
2,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,68.775,49.0008355,30.3635,134.899,33.725,3912.0654733479346,3912.0654733479346,1
3,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.85,49.00481125,30.1101,136.034,17.004,4353.090823344991,4353.090823344991,1
In the above CSV file you will find 1 column per:
- entity space property (input to the experiment) such as batch_sizeandmodel_max_length
- measured property (output to the experiment) such as dataset_tokens_per_second_per_gpuandgpu_memory_utilization_peak
For a complete list of the entity space properties check out the documentation for the finetune_full_benchmark-v1.0.0 experiment in the SFTTrainer docs. The complete list of measured properties is available there too.
Next steps¶
-  🔬️ Find out more about the SFTTrainer actuator 
 The actuator supports several experiments, each with a set of configurable parameters. 
-  ⚙️ Configure your RayCluster for SFTTrainer measurements 
 Learn how to configure your RayCluster for SFTTrainer measurements. 
-  Scale it up! 
 Take it to the next level by running an experiment on your remote RayCluster.