Entities and Entity Spaces
Entities¶
Entities represent the things you want to measure — for example, a molecule, a fine-tuning deployment configuration, or a robotic experiment setup.
Every Entity is described by a set of constitutive properties, and corresponding values, that uniquely identify it. For a fine-tuning deployment configuration these might be GPU model, number of GPUs, and batch size. For a molecule they might be a SMILES string.
Once an Experiment has been run on an Entity, it also gains observed properties — the measured outputs produced by that Experiment.
Example¶
Here is an Entity representing a fine-tuning deployment configuration:
Identifier: dataset_id.news-tokens-16384plus-entries-4096-model_name.llama3-8b-number_gpus.4.0-model_max_length.2048.0-torch_dtype.bfloat16-batch_size.16.0-gpu_model.NVIDIA-A100-80GB-PCIe
Constitutive properties:
name value
0 dataset_id news-tokens-16384plus-entries-4096
1 model_name llama3-8b
2 number_gpus 4.0
3 model_max_length 2048.0
4 torch_dtype bfloat16
5 batch_size 16.0
6 gpu_model NVIDIA-A100-80GB-PCIe
The identifier is derived from the constitutive property values — two Entities with the same values are the same Entity. Once Experiments have been run on this Entity, observed properties (measured values such as train_tokens_per_second) will also appear. See Target and Observed Properties for more.
Measuring Entities with Experiments
In order for an Experiment to measure an Entity, the Entity's constitutive property values must fall within the input domains declared by the Experiment.
Entity Spaces¶
An individual Entity is a single point. An Entity Space defines the full set of Entities you want to explore — all the points you could potentially measure.
An Entity Space is a set of constitutive properties, each with a Property Domain that constrains the values it can take. Each property is a dimension of the space, and every combination of values across all dimensions is an Entity in the space. That is, the Entity Space is the cartesian product of the dimensions.
Example: Fine-tuning Deployment Configuration¶
Number entities: 80
Categorical properties:
name values
0 dataset_id [news-tokens-16384plus-entries-4096]
1 model_name [granite-8b-code-base]
2 torch_dtype [bfloat16]
3 gpu_model [NVIDIA-A100-80GB-PCIe]
Discrete properties:
name range interval values
0 number_gpus [2, 5] None [2, 4]
1 model_max_length [512, 8193] None [512, 1024, 2048, 4096, 8192]
2 batch_size [1, 129] None [1, 2, 4, 8, 16, 32, 64, 128]
This space has 7 dimensions: 4 categorical (each fixed to a single value) and 3 discrete. The total number of Entities is the product of the number of values in each dimension:
1 × 1 × 1 × 1 × 2 × 5 × 8 = 80 Entities
Each Property Domain constrains one dimension. The categorical properties list their allowed values explicitly; the discrete properties specify a range and a set of values within it. For the full list of domain types see Properties and Domains.