DatasetPlain#
Currently, this API is unsupported in Python
-
class DatasetPlain#
A class for holding data for inference.
It may hold just the inputs, or both inputs and expected outputs (usually referred to as labels, or ground truth). If it contains both it can be used for training, or for estimating the accuracy of a given model. Otherwise it can only be used for inference. It loads the data from an h5 file, and divides it to batches according to a specified batch size.
Public Functions
-
inline DatasetPlain(int batchSize)#
A constructor.
- Parameters:
batchSize – The size of each batch of the dataset.
-
inline ~DatasetPlain()#
-
void loadFromH5(const std::string &sampleFile, const std::string &sampleWeights, const std::string &labelFile, const std::string &labelWeights)#
Loads the samples of this dataset, along with their corresponding labels, from the given h5 files.
- Parameters:
sampleFile – An h5 file containing the samples of the dataset.
sampleWeights – The path of the samples within “sampleFile”.
labelFile – An h5 file containing the expected labels of the samples in “sampleFile”.
labelWeights – The path of the labels within “labelFile”.
- Throws:
runtime_error – if the samples in “sampleFile” have more than four dimensions.
runtime_error – If the labels in “labelFile” have more than two dimensions.
runtime_error – If the number of labels in “labelFile” is not equal to the number of samples in “sampleFile”.
-
void loadFromH5(const std::string &sampleFile, const std::string &sampleWeights)#
Loads the samples of this dataset from the given h5 file.
- Parameters:
sampleFile – An h5 file containing the samples of the dataset.
sampleWeights – The path of the samples within “sampleFile”.
- Throws:
runtime_error – If the samples in the given file have dimension greater than 4.
-
void loadFromCsv(const std::string &file, bool ignoreFirstRow = false, char delimiter = ',', int maxBatches = -1, const std::set<int> &featuresToLoad = std::set<int>())#
Loads the samples of this dataset, along with their corresponding labels, from the given csv file.
This function assumes that each sample is a one dimensional vector of features.
- Parameters:
file – A csv file containing both the samples and the labels of this dataset. The j-th column of the i-th row should contain the j-th feature of the i-th sample. The last column in the csv file should contain the expected classes of the dataset’s samples. Different fields should be separated by a comma.
ignoreFirstRow – If true, the first row in the given csv file will be ignored.
delimiter – The string delimiter to be used when reading the csv. Default is a comma (‘,’).
maxBatches – A limit on the number of batches to be loaded from the given csv file. Only the first getBatchSize()*maxBatches rows of the given file will be processed. Default is -1.
featuresToLoad – A set containing the indexes of features to be loaded from the given csv files. Can be used to ignore some of the columns in the csv file. The default is an empty set
-
inline const DoubleTensor &getSamples(int batch) const#
Returns the set of samples of the batch specified by the given index.
- Parameters:
batch – The index of the batch to get its samples.
-
inline const DoubleTensor &getLabels(int batch) const#
Returns the set of labels of the batch specified by the given index.
- Parameters:
batch – The index of the batch to get its labels.
-
DoubleTensor getAllSamples() const#
Returns all of the samples in this dataset.
-
DoubleTensor getAllLabels() const#
Returns all of the labels in this dataset.
-
inline int getNumBatches() const#
Returns the number of batches in the dataset.
-
inline int getBatchSize() const#
Returns the size of the dataset’s batches.
-
inline BatchPlain getBatch(int batch) const#
Returns the batch specified by the given index.
- Parameters:
batch – The index of the required batch.
-
inline DoubleTensor getSample(int batch, int i) const#
Returns the i-th sample of the batch specified by the given index.
- Parameters:
i – The index of the required sample in the batch it belongs to.
batch – The index of the batch containing the required sample.
-
inline DoubleTensor getLabel(int batch, int i) const#
Returns the i-th label of the batch specified by the given index.
- Parameters:
i – The index of the required label in the batch it belongs to.
batch – The index of the batch containing the required label.
-
inline int getNumSamples() const#
Returns the total number of samples in the dataset.
-
inline int getNumFeatures() const#
Returns the total number of features in the dataset.
-
inline int getNumLabels() const#
Returns the total number of labels held in the dataset.
If this DatasetPlain object only holds input, the return value of this function is undefined. Otherwise, the return value will indicate the total number of labels in the dataset, which is also equal to the number of samples.
-
inline int getNumClasses() const#
Returns the number of possible classes which samples in this dataset can belong to.
-
inline DatasetPlain(int batchSize)#