Gaussian process

Module that contains classes to implement Gaussian process regression and evaluate the resulting model

class topsearch.potentials.gaussian_process.GaussianProcess(model_data: ModelData, kernel_choice: str, kernel_bounds: list, standardise_training: bool = False, standardise_response: bool = True, limit_highest_data: bool = False, matern_nu: float = None)

Description

Fit and evaluate a Gaussian process regression model to a given dataset. The function we evaluate is the value of the regression model at any point in feature space

model_data

The object containing the training and response data we will fit

Type:: ModelData instance

kernel_choice

The choice of kernel, can be ‘RBF’ or ‘Matern’

Type:: str

kernel_bounds

Limits on the kernel lengthscales and noise (final element) hyperparameters

Type:: list

standardise_training

Choose whether to standardise the training data before GP fit

Type:: bool

standardise_response

Choose whether to standardise the response data before GP fit

Type:: bool

limit_highest_data

Specifies if we should limit the largest response value. Useful in molecular applications where the steep repuslive wall gives huge values

Type:: logical

matern_nu

The nu parameter of the Matern kernel

Type:: float

gpr

The sklearn gaussian process object

Type:: class

add_data(new_training: NDArray[Any, Any], new_response: NDArray[Any, Any]) → None: Add data to the model data, accounting for standardisation

function(position: NDArray[Any, Any]) → float: Return the mean of the GP fit at position

function_and_std(position: NDArray[Any, Any]) → float: Return the mean and variance of the GP fit at position

get_score() → float: Get the R^2 score of the gp fit

initialise_gaussian_process(n_restarts: int = 50) → None: Initialise the Gaussian process from sklearn Returns a gpr object as an attribute of this class

initialise_kernel() → None: Create a specified kernel for use in a Gaussian process

lowest_point() → float: Find the lowest point in the current dataset

prepare_training_data(): Modify the training data that is provided to the Gaussian process to normalise training, response and limit their values

refit_model(n_restarts: int = 50) → None: Refit the GP model based on the current model_data

update_bounds(scaling: float) → None: Change the lengthscale bounds for kernel

write_fit() → None: Write the hyperparameters of the best GP fit