DatasetWrapper#

class DatasetWrapper(name, path, classes)#

Bases: object

This class serves as interface for all the mltoolbox datasets. Each mltoolbox dataset must implement the declared functionality.

The steps to extend the mltoolbox with an additional dataset: 1. Extend the DatasetWrapper class, implementing the data retrieval methods. The data retrieval methods have to return a torch.utils.data.Dataset object. You may use one of torch built-in datasets, or create a custom dataset. See the list of built-in datasets. See Creating a Custom Dataset for your files. More vision base classes for custom vision datasets. 2. Register the new class to DSFactory using the @DSFactory.register decoration

from pyhelayers.mltoolbox.data_loader.dataset_wrapper import DatasetWrapper
from pyhelayers.mltoolbox.data_loader.ds_factory import DSFactory
from torch.utils.data import Dataset

@DSFactory.register('new_dataset')
class newDataset(DatasetWrapper):
   def get_train_data(self):
        return _newDatasetLoader(mode='train')

   def get_test_data(self):
        return _newDatasetLoader(mode='test')

    def get_val_data(self):
        return _newDatasetLoader(mode='val')


class _myNewDatasetLoader(Dataset):
    def __getitem__(self, index):
        ...
        return image_tensor, label_tensor

Add an import to the new class in your main (so that the new class gets registered on start)

import newDataset

__init__(name, path, classes)#

Initializes the DatasetWrapper, serving as a base interface for all mltoolbox datasets. This class enforces the implementation of data retrieval methods in subclasses.

Parameters:

name (str) – The name of the dataset.
path (str) – The directory where the dataset is stored.
classes (int) – The number of classes in the dataset.

Methods

`__init__`(name, path, classes)	Initializes the DatasetWrapper, serving as a base interface for all mltoolbox datasets.
`get_approximation_set`()	returns the approximation data set, to be used for range approximation
`get_class_labels_dict`()	Returns the class_name to index mapping
`get_samples_per_class`(dataset)	Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset
`get_test_data`()	Returns the test dataset
`get_train_data`()	Returns the train dataset
`get_train_pipe_ffcv`(args)	Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply.
`get_val_data`()	Returns the validation dataset
`is_imbalanced`()	Returns True if the data is imbalanced

get_approximation_set() → Dataset#

returns the approximation data set, to be used for range approximation

Returns:: approximation dataset
Return type:: Dataset

get_class_labels_dict()#

Returns the class_name to index mapping

Returns:: A dictionary with items (class_name, class_index).
Return type:: Dictionary

get_samples_per_class(dataset: Dataset)#

Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset

Returns:: The number of samples in each class, or None (if the dataset is balanced - None is equivalent to a list with equal numbers)
Return type:: list<int>

get_test_data() → Dataset#

Returns the test dataset

Returns:: test dataset
Return type:: Dataset

get_train_data() → Dataset#

Returns the train dataset

Returns:: train dataset
Return type:: Dataset

get_train_pipe_ffcv(args) → dict#

Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply. Fileds with missing entries will use the default pipeline, which consists of the default decoder and ToTensor(), but a field can also be disabled by explicitly by passing None as its pipeline. https://docs.ffcv.io/api/loader.html

Params:

args (Arguments): user arguments

Returns:: dictionary of shape: {‘image’: <image_pipeline>, ‘label’: <label_pipeline>} as explained: https://docs.ffcv.io/making_dataloaders.html#pipelines another example: https://docs.ffcv.io/ffcv_examples/cifar10.html
Return type:: Dictionary

get_val_data() → Dataset#

Returns the validation dataset

Returns:: validation dataset
Return type:: Dataset

is_imbalanced() → bool#

Returns True if the data is imbalanced

Returns:: True if the data is imbalanced and False otherways
Return type:: bool