Measure throughput of finetuning locally¶
Note
This example illustrates:
-
Set up a local environment for running finetuning performance benchmarks with SFTTrainer
-
Benchmarking a set of finetuning configurations for a small model using a local context and only the CPU
The scenario¶
When you run a finetuning workload, you can choose values for parameters like the model name, batch size, and number of GPUs. To understand how these choices affect performance, a common strategy is to measure changes in system behavior by exploring the workload parameter space.
This approach applies to many machine learning workloads where performance depends on configuration.
In this example, ado
is used to explore the parameter space for finetuning a small language model on your laptop without using GPUs.
To explore this space, you will:
- define the parameters to test - such as the batch size and the model max length
- define what to test them with - In this case we will use SFTTrainer's
finetune_full_benchmark-v1.0.0
experiment - explore the parameter space - the sampling method
Here, you'll use the finetune_full_benchmark-v1.0.0
experiment that the SFTTrainer actuator provides to run four measurements on your laptop without using GPUs. Each measurement records metrics like dataset tokens per second
and stores the results in ado
's database.
Pre-requisites¶
Set Active Context¶
You should use the local
context for the example.
ado context local
Install the SFTTrainer actuator¶
Run pip install ado-sfttrainer
to install the SFTTrainer actuator plugin using the wheel that we push to PyPi.
Info
We are currently in the process of opensourcing ado
so the above wheel may not exist on PyPi. If that is the case when you try out this example, please follow the instructions under the Build the python wheel yourself
tab instead.
Info
This step assumes you are in the root directory of the ado source repository.
If you haven't already installed the SFTTrainer
actuator, run (assumes you are in the root directory of ado):
pip install plugins/actuators/sfttrainer
then executing
ado get actuators
should show an entry for SFTTrainer
like below
ACTUATOR ID
0 custom_experiments
1 mock
2 replay
3 SFTTrainer
Configure the parameters of the SFTTrainer actuator¶
SFTTrainer includes parameters that control its behavior. For example, it pushes any training metrics it collects, like system profiling metadata, to an AIM server by default. It also features parameters that define important paths, such as the location of the Hugging Face cache and the directory where the actuator expects to find files like the test dataset.
In this section you will configure the actuator for running experiments locally and storing data under the path /tmp/ado-sft-trainer-hello-world/
.
Create a file called actuator_configuration.yaml
with the following contents:
actuatorIdentifier: SFTTrainer
parameters:
match_exact_dependencies: False
data_directory: /tmp/ado-sft-trainer-hello-world/data-files
cache: /tmp/ado-sft-trainer-hello-world/cache
hf_home: ~/.cache/huggingface
To create the actuatorconfiguration
resource run:
ado create actuatorconfiguration -f actuator_configuration.yaml
The command will print the ID of the resource. Make a note of it, you will need it in a later step.
See the full list of parameters you can set in an actuatorconfiguration
resource for the SFTTrainer actuator in its reference docs.
Environment setup¶
Create the Dataset¶
The finetuning measurements require a synthetic dataset which is a file named news-tokens-16384plus-entries-4096.jsonl
in the directory /tmp/ado-sft-trainer-hello-world/data-files
which is under the path specified by the data_directory
actuator parameter.
You can reuse this Dataset for any future measurements you run on this device.
To generate the dataset run the following command:
sfttrainer_generate_dataset_text \
-o /tmp/ado-sft-trainer-hello-world/data-files/news-tokens-16384plus-entries-4096.jsonl
Download model weights¶
Next download the weights of the model we use in this example (smollm2-135m
) in the appropriate path under the directory specified by the hf_home
parameter of the SFTTrainer actuator.
First, store the below YAML to the file models.yaml
inside your working directory:
smollm2-135m:
Vanilla: HuggingFaceTB/SmolLM2-135M
Then, run the command:
sfttrainer_download_hf_weights -i models.yaml -o ~/.cache/huggingface
Run the example¶
This section explains the process of using ado
to define and launch a set of finetuning measurements which store their results in the local
context.
Define the finetuning workload configurations to test and how to test them¶
A discoveryspace
defines what you want to measure (Entity Space) and how you want to measure it (Measurement Space). It also links to the samplestore
which is where Entities and their measured properties are stored in.
In this example, we create a discoveryspace
that runs the finetune_full_benchmark-v1.0.0 experiment to finetune the smollm2-135m
model without using any GPUs.
The entitySpace
defined below includes four dimensions:
model_name
andnumber_gpus
each contain a single value.model_max_length
andbatch_size
each contain two values.
The total number of entities in the entitySpace
is the number of unique combinations of values across all dimensions. In this case, the configuration contains 4 entities.
You can find the complete list of the entity space properties in the documentation of the finetune_full_benchmark-v1.0.0 experiment.
To create the Discovery Space:
-
Create the file
space.yaml
with the following content# if you do not have a Sample Store we provide a command-line that will create one for you sampleStoreIdentifier: Replace this with the identifier of your sample store experiments: - experimentIdentifier: finetune_full_benchmark-v1.0.0 actuatorIdentifier: SFTTrainer parameterization: - property: identifier: fms_hf_tuning_version value: "2.8.2" - property: identifier: stop_after_seconds value: 30 - property: identifier: flash_attn value: False entitySpace: - identifier: "model_name" propertyDomain: values: [ 'smollm2-135m' ] - identifier: "number_gpus" propertyDomain: values: [ 0 ] - identifier: "model_max_length" propertyDomain: values: [ 512, 1024 ] - identifier: "batch_size" propertyDomain: values: [ 1, 2 ]
-
Create the space:
-
If you have an
samplestore
ID, run:- If you do not have aado create space -f space.yaml --set "sampleStoreIdentifier=$SAMPLE_STORE_IDENTIFIER"
samplestore
then runado create space -f space.yaml --new-sample-store
This will print a
discoveryspace
ID (e.g.,space-ea937f-831dba
). Make a note of this ID, you'll need it in the next step. -
Create a random walk operation
to explore the space¶
-
Create the file
operation.yaml
with the following content:spaces: - The identifier of the DiscoverySpace resource actuatorConfigurationIdentifiers: - The identifier of the Actuator Configuration resource operation: module: moduleClass: "RandomWalk" parameters: numberEntities: all singleMeasurement: True mode: sequential samplerType: generator
-
Replace the placeholders with your
discoveryspace
ID andactuatorconfiguration
ID and save it in a file with the nameoperation.yaml
-
Create the operation
ado create operation -f operation.yaml
The operation will execute the measurements (i.e. apply the experiment finetune_full_benchmark-v1.0.0 on the 4 entities) based on the definition of your discoveryspace
. The remaining three measurements will reuse both the cached model weights and the cached data, making them faster to complete.
Info
Each measurement takes about two minutes to complete, with a total of four measurements. Ray will also take a couple of minutes to build the Ray runtime environment on participating ray workers so expect the operation to take O(10) minutes to complete.
Examine the results of the exploration¶
After the operation completes, you can download the results of your measurements:
ado show entities --output-format csv --property-format=target space $yourDiscoverySpaceID
The command will generate a CSV file. Open it to explore the data that your operation has collected!
It should look similar to this:
,identifier,generatorid,experiment_id,model_name,number_gpus,model_max_length,batch_size,gpu_compute_utilization_min,gpu_compute_utilization_avg,gpu_compute_utilization_max,gpu_memory_utilization_min,gpu_memory_utilization_avg,gpu_memory_utilization_max,gpu_power_watts_min,gpu_power_watts_avg,gpu_power_watts_max,gpu_power_percent_min,gpu_power_percent_avg,gpu_power_percent_max,cpu_compute_utilization,cpu_memory_utilization,train_runtime,train_samples_per_second,train_steps_per_second,dataset_tokens_per_second,dataset_tokens_per_second_per_gpu,is_valid
0,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,79.55,49.11366699999999,30.4385,134.566,33.642,2624.0452059069926,2624.0452059069926,1
1,model_name.smollm2-135m-number_gpus.0-model_max_length.512-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,512,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,76.3,49.163925750000004,30.095,136.103,17.013,4355.274962618375,4355.274962618375,1
2,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.1,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,68.775,49.0008355,30.3635,134.899,33.725,3912.0654733479346,3912.0654733479346,1
3,model_name.smollm2-135m-number_gpus.0-model_max_length.1024-batch_size.2,explicit_grid_sample_generator,SFTTrainer.finetune_full_benchmark-v1.0.0-fms_hf_tuning_version.2.8.2-stop_after_seconds.30-flash_attn.0,smollm2-135m,0,1024,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46.85,49.00481125,30.1101,136.034,17.004,4353.090823344991,4353.090823344991,1
In the above CSV file you will find 1 column per:
- entity space property (input to the experiment) such as
batch_size
andmodel_max_length
- measured property (output to the experiment) such as
dataset_tokens_per_second_per_gpu
andgpu_memory_utilization_peak
For a complete list of the entity space properties check out the documentation for the finetune_full_benchmark-v1.0.0 experiment in the SFTTrainer docs. The complete list of measured properties is available there too.
Next steps:¶
-
🔬️ Find out more about the SFTTrainer actuator
The actuator supports several experiments, each with a set of configurable parameters.
-
⚙️ Configure your RayCluster for SFTTrainer measurements
Learn how to configure your RayCluster for SFTTrainer measurements.
-
Scale it up!
Take it to the next level by running an experiment on your remote RayCluster.