samplestore
A samplestore resource is a database containing entities along with result of experiments that have been applied to them.
samplestores and discoveryspaces¶
When you create a discovery space you associate a samplestore with it. This is where the discoveryspace will read and write data i.e. entities and the results of experiments on them. You primarily access the entities in a samplestore via a discoveryspace that is attached to it.
You can think of a discoveryspace as a view or filter on a sample store - when you access data in a samplestore through a discovery space you only see data that matches the discoveryspace.
Tip
- Multiple
discoveryspaces can use the samesamplestore - There is no restriction or condition on the
discoveryspaces sharing asamplestorei.e. they can be very similar or completely different.
samplestores and data-sharing¶
When multiple discoveryspaces use the same samplestore this enables transparent data-sharing between the different discoveryspaces. When and how data is shared is covered in detail in shared sample stores.
To see the discoveryspaces using a given samplestore run
ado show related samplestore $SAMPLE_STORE_IDENTIFIER
Tip
The greater the similarity between two discoveryspaces, the greater the chance they can share data. So it is usually beneficial to ensure that such discoveryspaces use the same samplestore.
Warning
If you use two different samplestores for similar discoveryspaces there is no way to share results between them.
active and passive Sample Stores¶
ado distinguishes two types of Sample Stores: active Sample Stores which allow read and write; and passive Sample Stores that only have read capabilities (for example a CSV file containing measurement data).
All samplestore resources created with ado will be `active. However, they can copy data in from passive Sample Stores.
The primary Sample Store type: SQLSampleStore¶
The primary Sample Store used in ado, and represented by samplestore resources, is SQLSampleStore. SQLSampleStore represents storage in SQL tables. When you create a samplestore resource that uses SQLSampleStore the storage is allocated automatically in the SQL db associated with the active context
Creating a samplestore¶
Running ado create samplestore --new-sample-store will create an empty SQLSampleStore in the current context.
The default samplestore¶
ado provides a default samplestore (whose identifier is default) per project, removing the need to create one explicitly unless necessary. This samplestore is created automatically when it is first required.
There are three ways to use the default samplestore - each will create it, if it doesn't already exist.
- Referencing it in the space configuration by setting the
sampleStoreIdentifiertodefaultin the space YAML:
sampleStoreIdentifier: default
- Using the
--use-default-sample-storeflag with theado create spacecommand:
ado create space --use-default-sample-store
- Using the
--setflag to explicitly override the sample store identifier:
ado create space --set sampleStoreIdentifier=default
These options are interchangeable and can be used depending on your workflow or preference.
Copying data into a samplestore¶
You can specify data to be copied into a new samplestore resource on creation. The data comes from other Sample Stores. The general structure of the YAML when copying from other sample stores is:
specification:
module:
moduleClass: SQLSampleStore
moduleName: orchestrator.core.samplestore.sql
copyFrom: # An array of Sample Stores data will be copied from
- identifier: # Optional, the id of the Sample Store if not given in the storageLocation
module: # The type of this Sample Store
moduleClass: ... # The module class for this sample store
moduleName: ... # The name of the module containing the class
parameters: # Sample Store parameters
storageLocation: # The location of this Sample Store
The Sample Store types section details how to fill the above fields for the different available Sample Store. Here is an example of copying data from a CSV file using CSVSampleStore:
specification:
module:
moduleName: orchestrator.core.samplestore.sql
moduleClass: SQLSampleStore
copyFrom:
- module:
moduleClass: CSVSampleStore
storageLocation:
path: "examples/ml-multi-cloud/ml_export.csv"
parameters:
generatorIdentifier: "multi-cloud-ml"
identifierColumn: "config"
experiments:
- experimentIdentifier: "benchmark_performance"
propertyMap:
wallClockRuntime: "wallClockRuntime"
Accessing the entities in a sample store¶
You access the entities in a samplestore via a discovery space attached to it.
For an existing discoveryspace to retrieve all entities that match it run
ado show entities space $SPACEID --include matching
You can also define a discoveryspace in a YAML and run:
ado show entities space --file $FILE
This allows you to see what entities match a space without having to create it.
Sample Store types¶
SQLSampleStore¶
This is an active Sample Store that stores entity data in SQL tables. In ado a SQLSampleStore is always associated with a particular project.
When you want to copy from another SQLSampleStore you need the identifier and the metastore URL to the the project it is in
copyFrom:
- identifier: source_abc123
module:
moduleClass: SQLSampleStore
moduleName: orchestrator.core.samplestore.sql
storageLocation:
host: localhost
port: 30002
database: my_project. # The database field is the name of the project containing the samplestore
user: my_project # The user field is the name of the project containing the samplestore
password: XXXXXXX
CSVSampleStore¶
This is a passive Sample Store that can be used to extract entities from a CSV file. Its assumed each row is an entity and the columns are constitutive properties or observed properties
copyFrom:
- module:
moduleClass: CSVSampleStore
moduleName: orchestrator.core.samplestore.csv
storageLocation:
path: 'examples/ml-multi-cloud/ml_export.csv'. # The path to the CSV file
parameters:
generatorIdentifier: 'multi-cloud-ml' # A string that will be stored with the extracted entities as their generatorIdentifier
identifierColumn: 'config'. # The column in the CSV file that contains the entity id
constitutivePropertyColumns:
- # A list of columns which contain constitutive properties
experiments: # A list of dictionaries that map CSV columns to experiments and target properties. Each dictionary is an experiment
- experimentIdentifier: 'benchmark_performance' # The experiment name you want the following properties to be associated with
propertyMap: # List of target property name:column id pairs
wallClockRuntime: 'wallClockRuntime' # The key is the target property name, the value is the column containing the values for that property
Note, since CSV have arbitrary data in general there is no way ado can know how a particular value was generated or how to generate new such values. However, the measurements in a CSV can be mapped to the "experiment+property" model that ado uses, if you want to copy them.
You do not have to copy all the columns in a CSV or have any experiments.
Deleting sample stores¶
Info
Please note that standard deletion constraints apply alongside the considerations discussed in this section.
Deleting a sample store is a high-impact operation and should be performed with caution. When a sample store is deleted:
- All stored entities will be deleted.
- All leftover measurement results stored in it will be permanently deleted. These will be measurements that were copied into the
samplestorei.e., not generated throughadooperations. All results fromadooperations would have already been subject to standard deletion constraints. - The corresponding database tables will be dropped.
This is especially critical when the sample store was populated externally, such as via a CSVSampleStore. In such cases, deletion may result in the loss of externally sourced data that cannot be recovered unless it has been backed up beforehand.
To prevent this from unintentionally happening, ado will check if the sample store contains stored results, and exit if this is the case. A warning such as the following will be output:
ERROR: Cannot delete sample store 995ff6 because there are 68 measurement results present in the sample store.
HINT: You can force the deletion by adding the --force flag.
If this is expected, re-running the same command while adding the --force flag according to the hint will perform the deletion.