DatasetWrapper#

class DatasetWrapper(name, path, classes)#

Bases: object

This class serves as interface for all the mltoolbox datasets. Each mltoolbox dataset must implement the declared functionality.

The steps to extend the mltoolbox with an additional dataset: 1. Extend the DatasetWrapper class, implementing the data retrieval methods. The data retrieval methods have to return a torch.utils.data.Dataset object. You may use one of torch built-in datasets, or create a custom dataset. See the list of built-in datasets. See Creating a Custom Dataset for your files. More vision base classes for custom vision datasets. 2. Register the new class to DSFactory using the @DSFactory.register decoration

from pyhelayers.mltoolbox.data_loader.dataset_wrapper import DatasetWrapper
from pyhelayers.mltoolbox.data_loader.ds_factory import DSFactory
from torch.utils.data import Dataset

@DSFactory.register('new_dataset')
class newDataset(DatasetWrapper):
   def get_train_data(self):
        return _newDatasetLoader(mode='train')

   def get_test_data(self):
        return _newDatasetLoader(mode='test')

    def get_val_data(self):
        return _newDatasetLoader(mode='val')


class _myNewDatasetLoader(Dataset):
    def __getitem__(self, index):
        ...
        return image_tensor, label_tensor
  1. Add an import to the new class in your main (so that the new class gets registered on start)

import newDataset
__init__(name, path, classes)#

Methods

__init__(name, path, classes)

get_approximation_set()

returns the approximation data set, to be used for range approximation

get_class_labels_dict()

Returns the class_name to index mapping

get_samples_per_class(dataset)

Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset

get_test_data()

Returns the test dataset

get_train_data()

Returns the train dataset

get_train_pipe_ffcv(args)

Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply.

get_val_data()

Returns the validation dataset

is_imbalanced()

Returns True if the data is imbalanced

get_approximation_set() Dataset#

returns the approximation data set, to be used for range approximation

Returns:

approximation dataset

Return type:

Dataset

get_class_labels_dict()#

Returns the class_name to index mapping

Returns:

A dictionary with items (class_name, class_index).

Return type:

Dictionary

get_samples_per_class(dataset: Dataset)#

Returns the number of samples in each class or None (if the dataset is balanced - None is equivalent to a list with equal numbers) :param dataset: the dataset split :type dataset: Dataset

Returns:

The number of samples in each class, or None (if the dataset is balanced - None is equivalent to a list with equal numbers)

Return type:

list<int>

get_test_data() Dataset#

Returns the test dataset

Returns:

test dataset

Return type:

Dataset

get_train_data() Dataset#

Returns the train dataset

Returns:

train dataset

Return type:

Dataset

get_train_pipe_ffcv(args) dict#

Returns the Dictionary defining for each field the sequence of ffcv Decoders and transforms to apply. Fileds with missing entries will use the default pipeline, which consists of the default decoder and ToTensor(), but a field can also be disabled by explicitly by passing None as its pipeline. https://docs.ffcv.io/api/loader.html

Params:
  • args (Arguments): user arguments

Returns:

dictionary of shape: {‘image’: <image_pipeline>, ‘label’: <label_pipeline>} as explained: https://docs.ffcv.io/making_dataloaders.html#pipelines another example: https://docs.ffcv.io/ffcv_examples/cifar10.html

Return type:

Dictionary

get_val_data() Dataset#

Returns the validation dataset

Returns:

validation dataset

Return type:

Dataset

is_imbalanced() bool#

Returns True if the data is imbalanced

Returns:

True if the data is imbalanced and False otherways

Return type:

bool