DatasetWrapper#
- class DatasetWrapper(name, path, classes)#
Bases:
object
This class serves as interface for all the mltoolbox datasets. Each mltoolbox dataset must implement the declared functionality.
The steps to extend the mltoolbox with an additional dataset: 1. Extend the DatasetWrapper class, implementing the data retrieval methods. The data retrieval methods have to return a torch.utils.data.Dataset object. You may use one of torch built-in datasets, or create a custom dataset. See the list of built-in datasets. See Creating a Custom Dataset for your files. More vision base classes for custom vision datasets. 2. Register the new class to DSFactory using the @DSFactory.register decoration
from pyhelayers.mltoolbox.data_loader.dataset_wrapper import DatasetWrapper from pyhelayers.mltoolbox.data_loader.ds_factory import DSFactory from torch.utils.data import Dataset @DSFactory.register('new_dataset') class newDataset(DatasetWrapper): def get_train_data(self): return _newDatasetLoader(mode='train') def get_test_data(self): return _newDatasetLoader(mode='test') def get_val_data(self): return _newDatasetLoader(mode='val') class _myNewDatasetLoader(Dataset): def __getitem__(self, index): ... return image_tensor, label_tensor
Add an import to the new class in your main (so that the new class gets registered on start)
import newDataset
- __init__(name, path, classes)#
Methods
__init__
(name, path, classes)returns the approximation data set, to be used for range approximation
Returns the class_name to index mapping
get_samples_per_class
(dataset)Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset
Returns the test dataset
Returns the train dataset
get_train_pipe_ffcv
(args)Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply.
Returns the validation dataset
Returns True if the data is imbalanced
- get_approximation_set() Dataset #
returns the approximation data set, to be used for range approximation
- Returns:
approximation dataset
- Return type:
- get_class_labels_dict()#
Returns the class_name to index mapping
- Returns:
A dictionary with items (class_name, class_index).
- Return type:
Dictionary
- get_samples_per_class(dataset: Dataset)#
Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset
- Returns:
The number of samples in each class, or None (if the dataset is balanced - None is equivalent to a list with equal numbers)
- Return type:
list<int>
- get_train_pipe_ffcv(args) dict #
Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply. Fileds with missing entries will use the default pipeline, which consists of the default decoder and ToTensor(), but a field can also be disabled by explicitly by passing None as its pipeline. https://docs.ffcv.io/api/loader.html
- Params:
args (Arguments): user arguments
- Returns:
dictionary of shape: {‘image’: <image_pipeline>, ‘label’: <label_pipeline>} as explained: https://docs.ffcv.io/making_dataloaders.html#pipelines another example: https://docs.ffcv.io/ffcv_examples/cifar10.html
- Return type:
Dictionary
- is_imbalanced() bool #
Returns True if the data is imbalanced
- Returns:
True if the data is imbalanced and False otherways
- Return type:
bool