samplestore
A samplestore resource is a database containing entities along with results of experiments that have been applied to them.
samplestores and discoveryspaces¶
When you create a discovery space you associate a samplestore with it. This is where the discoveryspace will read and write data i.e., entities and the results of experiments on them. You primarily access the entities in a samplestore via a discoveryspace that is attached to it.
You can think of a discoveryspace as a view or filter on a sample store - when you access data in a samplestore through a discovery space you only see data that matches the discoveryspace.
Tip
- Multiple
discoveryspaces can use the samesamplestore - There is no restriction or condition on the
discoveryspaces sharing asamplestorei.e. they can be very similar or completely different.
samplestores and data-sharing¶
When multiple discoveryspaces use the same samplestore this enables transparent data-sharing between the different discoveryspaces. When and how data is shared is covered in detail in shared sample stores.
To see the discoveryspaces using a given samplestore run
ado show related samplestore $SAMPLE_STORE_IDENTIFIER
Tip
The greater the similarity between two discoveryspaces, the greater the chance they can share data. So it is usually beneficial to ensure that such discoveryspaces use the same samplestore.
Warning
If you use two different samplestores for similar discoveryspaces there is no way to share results between them.
active and passive Sample Stores¶
ado distinguishes two types of Sample Stores: active Sample Stores which allow read and write; and passive Sample Stores that only have read capabilities (for example a CSV file containing measurement data).
All samplestore resources created with ado will be active. However, they can copy data in from passive Sample Stores.
The primary Sample Store type: SQLSampleStore¶
The primary Sample Store used in ado, and represented by samplestore resources, is SQLSampleStore. SQLSampleStore represents storage in SQL tables. When you create a samplestore resource that uses SQLSampleStore the storage is allocated automatically in the SQL db associated with the active context
Creating a samplestore¶
Running ado create samplestore --new-sample-store will create an empty SQLSampleStore in the current context.
The default samplestore¶
ado provides a default samplestore (whose identifier is default) per project, removing the need to create one explicitly unless necessary. This samplestore is created automatically when it is first required.
There are three ways to use the default samplestore - each will create it, if it doesn't already exist.
- Referencing it in the space configuration by setting the
sampleStoreIdentifiertodefaultin the space YAML:
sampleStoreIdentifier: default
- Using the
--use-default-sample-storeflag with theado create spacecommand:
ado create space --use-default-sample-store
- Using the
--setflag to explicitly override the sample store identifier:
ado create space --set sampleStoreIdentifier=default
These options are interchangeable and can be used depending on your workflow or preference.
Copying data into a samplestore¶
You can specify data to be copied into a new samplestore resource on creation. The data comes from other Sample Stores. The general structure of the YAML when copying from other sample stores is:
specification:
module:
moduleClass: SQLSampleStore
moduleName: orchestrator.core.samplestore.sql
copyFrom: # An array of Sample Stores data will be copied from
- identifier: # Optional, the id of the Sample Store if not given in the storageLocation
module: # The type of this Sample Store
moduleClass: ... # The module class for this sample store
moduleName: ... # The name of the module containing the class
parameters: # Sample Store parameters
storageLocation: # The location of this Sample Store
The Sample Store types section details how to fill the above fields for the different available Sample Store. Here is an example of copying data from a CSV file using CSVSampleStore:
specification:
module:
moduleName: orchestrator.core.samplestore.sql
moduleClass: SQLSampleStore
copyFrom:
- module:
moduleClass: CSVSampleStore
storageLocation:
path: "examples/ml-multi-cloud/ml_export.csv"
parameters:
generatorIdentifier: "multi-cloud-ml"
identifierColumn: "config"
experiments:
- experimentIdentifier: "benchmark_performance"
constitutivePropertyMap:
- cpu_type
observedPropertyMap:
- wallClockRuntime
Accessing the entities in a sample store¶
You access the entities in a samplestore via a discovery space attached to it.
For an existing discoveryspace to retrieve all entities that match it run
ado show entities space $SPACEID --include matching
You can also define a discoveryspace in a YAML and run:
ado show entities space --file $FILE
This allows you to see what entities match a space without having to create it.
Sample Store types¶
SQLSampleStore¶
This is an active Sample Store that stores entity data in SQL tables. In ado a SQLSampleStore is always associated with a particular project.
When you want to copy from another SQLSampleStore you need the identifier and the metastore URL to the project it is in
copyFrom:
- identifier: source_abc123
module:
moduleClass: SQLSampleStore
moduleName: orchestrator.core.samplestore.sql
storageLocation:
host: localhost
port: 30002
database: my_project. # The database field is the name of the project containing the samplestore
user: my_project # The user field is the name of the project containing the samplestore
password: XXXXXXX
CSVSampleStore¶
This is a passive Sample Store that can be used to extract entities from a CSV file. It is assumed each row is an entity and the columns are constitutive properties or observed properties.
Importing data from external experiments¶
The below YAML illustrates importing data from a CSV for an experiment which is not provided by an installed ado actuator.
copyFrom:
- module:
moduleClass: CSVSampleStore
moduleName: orchestrator.core.samplestore.csv
storageLocation:
path: 'examples/ml-multi-cloud/ml_export.csv'. # The path to the CSV file
parameters:
generatorIdentifier: 'multi-cloud-ml' # A string that will be stored with the extracted entities as their generatorIdentifier
identifierColumn: 'config'. # The column in the CSV file that contains the entity id
experiments: # A list of dictionaries that map CSV columns to experiments and target properties. Each dictionary is an experiment
- experimentIdentifier: 'benchmark_performance' # The experiment name you want the following properties to be associated with
constitutivePropertyMap: # A list of columns which contain constitutive properties. Can also be a dict of property name to column name pairs
- cpu_value
observedPropertyMap: # Dict of target property name:column id pairs, or list of column ids
wallClockRuntime: 'wall-clock runtime' # The key is the target property name, the value is the column containing the values for that property
You must specify which CSV columns contain observed properties (measurements/results) and which contain constitutive properties (input parameters/configurations). You can do this in one of two ways.
- Use CSV column names as-is - Pass a list:
constitutivePropertyMap:
- cpu_value
- memory_gb
This uses cpu_value and memory_gb as both the column names AND property names.
- Rename columns - Pass a dictionary:
observedPropertyMap:
wallClockRuntime: 'wall-clock runtime'
throughput: 'requests_per_sec'
This reads from CSV columns wall-clock runtime and requests_per_sec, but names them wallClockRuntime and throughput in the experiment.
Importing data from existing actuators¶
When importing CSV data that was exported from ado or contains results from an actuator available in your current instance, you can reference the actual actuator and experiment identifiers. The property mappings (observedPropertyMap and constitutivePropertyMap) become optional because ado can automatically infer the correct column mappings from the actuator's experiment definition.
copyFrom:
- module:
moduleClass: CSVSampleStore
moduleName: orchestrator.core.samplestore.csv
storageLocation:
path: 'results_export.csv'
parameters:
generatorIdentifier: 'vllm-benchmark-run'
identifierColumn: 'config'
experiments:
- experimentIdentifier: 'test-deployment-v1'
actuatorIdentifier: 'vllm_performance' # Specify the actual actuator
propertyFormat: 'target' # if the columns for observed properties use target property names or observed property names
ado will verify that:
- The specified actuator exists in the current instance
- The experiment exists in that actuator's catalog
- The CSV contains all required constitutive properties for the experiment (i.e. has columns with correct names)
If the columns in the CSV don't match the experiments constitutive/observed properties you can use the observedPropertyMap and/or constitutivePropertyMap fields to provide a mapping. If these are provided the keys will be validated against the experiment definition.
If any validation fails, a detailed error message will indicate what's wrong.
Parameterized Experiments
Importing parameterized experiment is not supported yet. If you use a parameterized experiment identifier the data will fail to import with an UnknownExperimentError If you use the base experiment identifier the data will be mapped to the wrong experiment
Deleting sample stores¶
Info
Please note that standard deletion constraints apply alongside the considerations discussed in this section.
Deleting a sample store is a high-impact operation and should be performed with caution. When a sample store is deleted:
- All stored entities will be deleted.
- All leftover measurement results stored in it will be permanently deleted. These will be measurements that were copied into the
samplestorei.e., not generated throughadooperations. All results fromadooperations would have already been subject to standard deletion constraints. - The corresponding database tables will be dropped.
This is especially critical when the sample store was populated externally, such as via a CSVSampleStore. In such cases, deletion may result in the loss of externally sourced data that cannot be recovered unless it has been backed up beforehand.
To prevent this from unintentionally happening, ado will check if the sample store contains stored results, and exit if this is the case. A warning such as the following will be output:
ERROR: Cannot delete sample store 995ff6 because there are 68 measurement results present in the sample store.
HINT: You can force the deletion by adding the --force flag.
If this is expected, re-running the same command while adding the --force flag according to the hint will perform the deletion.