Skip to content

Measure throughput of finetuning locally

Note

This example illustrates:

  1. Set up a local environment for running finetuning performance benchmarks with SFTTrainer

  2. Benchmarking a set of finetuning configurations for a small model using a local context and only the CPU

The scenario

When you run a finetuning workload, you can choose values for parameters like the model name, batch size, and number of GPUs. To understand how these choices affect performance, a common strategy is to measure changes in system behavior by exploring the workload parameter space.

This approach applies to many machine learning workloads where performance depends on configuration.

In this example, ado is used to explore the parameter space for finetuning a small language model on your laptop without using GPUs.

To explore this space, you will:

  • define the parameters to test - such as the batch size and the model max length
  • define what to test them with - In this case we will use SFTTrainer's finetune_full_benchmark-v1.0.0 experiment
  • explore the parameter space - the sampling method

Here, you'll use the finetune_full_benchmark-v1.0.0 experiment that the SFTTrainer actuator provides to run four measurements on your laptop without using GPUs. Each measurement records metrics like dataset tokens per second and stores the results in ado's database.

Pre-requisites

Set Active Context

You should use the local context for the example.

ado context local

Install the SFTTrainer actuator

Run pip install ado-sfttrainer to install the SFTTrainer actuator plugin using the wheel that we push to PyPi.

Info

We are currently in the process of opensourcing ado so the above wheel may not exist on PyPi. If that is the case when you try out this example, please follow the instructions under the Build the python wheel yourself tab instead.

Info

This step assumes you are in the root directory of the ado source repository.

If you haven't already installed the SFTTrainer actuator, run (assumes you are in the root directory of ado):

pip install plugins/actuators/sfttrainer

then executing

ado get actuators

should show an entry for SFTTrainer like below

           ACTUATOR ID
0   custom_experiments
1                 mock
2               replay
3           SFTTrainer

Configure the parameters of the SFTTrainer actuator

SFTTrainer includes parameters that control its behavior. For example, it pushes any training metrics it collects, like system profiling metadata, to an AIM server by default. It also features parameters that define important paths, such as the location of the Hugging Face cache and the directory where the actuator expects to find files like the test dataset.

In this section you will configure the actuator for running experiments locally and storing data under the path /tmp/ado-sft-trainer-hello-world/.

Create a file called actuator_configuration.yaml with the following contents:

actuatorIdentifier: SFTTrainer
parameters:
  match_exact_dependencies: False
  data_directory: /tmp/ado-sft-trainer-hello-world/data-files
  cache: /tmp/ado-sft-trainer-hello-world/cache
  hf_home: ~/.cache/huggingface

To create the actuatorconfiguration resource run:

ado create actuatorconfiguration -f actuator_configuration.yaml

The command will print the ID of the resource. Make a note of it, you will need it in a later step.

See the full list of parameters you can set in an actuatorconfiguration resource for the SFTTrainer actuator in its reference docs.

Environment setup

Create the Dataset

The finetuning measurements require a synthetic dataset which is a file named news-tokens-16384plus-entries-4096.jsonl in the directory /tmp/ado-sft-trainer-hello-world/data-files which is under the path specified by the data_directory actuator parameter.

You can reuse this Dataset for any future measurements you run on this device.

To generate the dataset run the following command:

sfttrainer_generate_dataset_text \
  -o /tmp/ado-sft-trainer-hello-world/data-files/news-tokens-16384plus-entries-4096.jsonl

Download model weights

Next download the weights of the model we use in this example (smollm2-135m) in the appropriate path under the directory specified by the hf_home parameter of the SFTTrainer actuator.

First, store the below YAML to the file models.yaml inside your working directory:

smollm2-135m:
  Vanilla: HuggingFaceTB/SmolLM2-135M

Then, run the command:

sfttrainer_download_hf_weights -i models.yaml -o ~/.cache/huggingface

Run the example

This section explains the process of using ado to define and launch a set of finetuning measurements which store their results in the local context.

Define the finetuning workload configurations to test and how to test them

A discoveryspace defines what you want to measure (Entity Space) and how you want to measure it (Measurement Space). It also links to the samplestore which is where Entities and their measured properties are stored in.

In this example, we create a discoveryspace that runs the finetune_full_benchmark-v1.0.0 experiment to finetune the smollm2-135m model without using any GPUs.

The entitySpace defined below includes four dimensions:

  • model_name and number_gpus each contain a single value.
  • model_max_length and batch_size each contain two values.

The total number of entities in the entitySpace is the number of unique combinations of values across all dimensions. In this case, the configuration contains 4 entities.

You can find the complete list of the entity space properties in the documentation of the finetune_full_benchmark-v1.0.0 experiment.

To create the Discovery Space:

  1. Create the file space.yaml with the following content

    # if you do not have a Sample Store we provide a command-line that will create one for you
    sampleStoreIdentifier: Replace this with the identifier of your sample store
    
    experiments:
    - experimentIdentifier: finetune_full_benchmark-v1.0.0
      actuatorIdentifier: SFTTrainer
      parameterization:
        - property:
            identifier: fms_hf_tuning_version
          value: "2.8.2"
        - property:
            identifier: stop_after_seconds
          value: 30
        - property:
            identifier: flash_attn
          value: False
    
    entitySpace:
    - identifier: "model_name"
      propertyDomain:
        values: [ 'smollm2-135m' ]
    - identifier: "number_gpus"
      propertyDomain:
        values: [ 0 ]
    - identifier: "model_max_length"
      propertyDomain:
        values: [ 512, 1024 ]
    - identifier: "batch_size"
      propertyDomain:
        values: [ 1, 2 ] 
    
  2. Create the space:

    • If you have an samplestore ID, run:

      ado create space -f space.yaml --set "sampleStoreIdentifier=$SAMPLE_STORE_IDENTIFIER"
      
      - If you do not have a samplestore then run

      ado create space -f space.yaml --new-sample-store
      

    This will print a discoveryspace ID (e.g., space-ea937f-831dba). Make a note of this ID, you'll need it in the next step.

Create a random walk operation to explore the space

  1. Create the file operation.yaml with the following content:

    spaces:
    - The identifier of the DiscoverySpace resource
    actuatorConfigurationIdentifiers:
    - The identifier of the Actuator Configuration resource
    
    operation:
      module:
        moduleClass: "RandomWalk"
      parameters:
        numberEntities: all
        singleMeasurement: True
        mode: sequential
        samplerType: generator
    
  2. Replace the placeholders with your discoveryspace ID and actuatorconfiguration ID and save it in a file with the name operation.yaml

  3. Create the operation

    ado create operation -f operation.yaml
    

The operation will execute the measurements (i.e. apply the experiment finetune_full_benchmark-v1.0.0 on the 4 entities) based on the definition of your discoveryspace. The remaining three measurements will reuse both the cached model weights and the cached data, making them faster to complete.

Info

Each measurement takes about two minutes to complete, with a total of four measurements. Ray will also take a couple of minutes to build the Ray runtime environment on participating ray workers so expect the operation to take O(10) minutes to complete.

Examine the results of the exploration

After the operation completes, you can download the results of your measurements:

ado show entities --output-format csv --property-format=target space $yourDiscoverySpaceID

The command will generate a CSV file. Open it to explore the data that your operation has collected!

It should look similar to this:

,identifier,generatorid,experiment_id,model_name,number_gpus,model_max_length,batch_size,gpu_compute_utilization_min,gpu_compute_utilization_avg,gpu_compute_utilization_max,gpu_memory_utilization_min,gpu_memory_utilization_avg,gpu_memory_utilization_max,gpu_power_watts_min,gpu_power_watts_avg,gpu_power_watts_max,gpu_power_percent_min,gpu_power_percent_avg,gpu_power_percent_max,cpu_compute_utilization,cpu_memory_utilization,train_runtime,train_samples_per_second,train_steps_per_second,dataset_tokens_per_second,dataset_tokens_per_second_per_gpu,is_valid
0,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,79.55,49.11366699999999,30.4385,134.566,33.642,2624.0452059069926,2624.0452059069926,1
1,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,76.3,49.163925750000004,30.095,136.103,17.013,4355.274962618375,4355.274962618375,1
2,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,68.775,49.0008355,30.3635,134.899,33.725,3912.0654733479346,3912.0654733479346,1
3,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.85,49.00481125,30.1101,136.034,17.004,4353.090823344991,4353.090823344991,1

In the above CSV file you will find 1 column per:

  • entity space property (input to the experiment) such as batch_size and model_max_length
  • measured property (output to the experiment) such as dataset_tokens_per_second_per_gpu and gpu_memory_utilization_peak

For a complete list of the entity space properties check out the documentation for the finetune_full_benchmark-v1.0.0 experiment in the SFTTrainer docs. The complete list of measured properties is available there too.

Next steps: