Model data

Module that contains the ModelData class. This class stores and operates on a training dataset for use in machine learning applications

class topsearch.data.model_data.ModelData(training_file: str, response_file: str)

Class to store the data associated with a machine learning model, and perform the methods to modify the data.

training

All training data points of the dataset

Type:

NDArray

response

The corresponding response values for each data point

Type:

NDArray

n_points

The number of data points in the dataset

Type:

int

resp_props

Dictionary containing the statistics of the response array

Type:

dict

train_props

Dictionary containing the statistics of the training array

Type:

dict

hull

Object storing the convex hull of the training data

Type:

NDArray

append_data(new_training: NDArray[Any, Any], new_response: NDArray[Any, Any]) None

Add additional training and response data to the attributes that store these within the class

convex_hull() None

Compute the convex hull for the training data

feature_subset(features: list) None

Get a subset of the features of the training data

limit_response_maximum(upper_limit: float) None

Limits the maximum allowed response value

normalise_response() None

Returns the normalised response values, scaled to lie in the range (0,1)

normalise_training() None

Limit all features to lie within the range (0, 1)

point_in_hull(point: NDArray[Any, Any]) bool

Determine if point is within convex hull of training data

read_data(training_file: str, response_file: str) None

Read in the training and repsonse values needed for an ML model. Stored in class attributes

remove_duplicates(dist_cutoff: float = 1e-07) None

Remove any minima within dist_cutoff from each other, retaining only the first

standardise_response() None

Standardises response values, enforcing a mean of 0, and a standard deviation of 1

standardise_training() None

Standardises each feature of the training data to have mean 0, standard deviation 1

unnormalise_response()

Undo the normalisation of the training array

unnormalise_training()

Undo the normalisation of the training array

unstandardise_response()

Undo the normalisation of the response array

unstandardise_training()

Undo the normalisation of the response array

write_data(training_file: str, response_file: str) None

Writes the training and response attributes into the specified files