Gaussian process

Module that contains classes to implement Gaussian process regression and evaluate the resulting model

class topsearch.potentials.gaussian_process.GaussianProcess(model_data: ModelData, kernel_choice: str, kernel_bounds: list, standardise_training: bool = False, standardise_response: bool = True, limit_highest_data: bool = False, matern_nu: float = None)

Description

Fit and evaluate a Gaussian process regression model to a given dataset. The function we evaluate is the value of the regression model at any point in feature space

model_data

The object containing the training and response data we will fit

Type:

ModelData instance

kernel_choice

The choice of kernel, can be ‘RBF’ or ‘Matern’

Type:

str

kernel_bounds

Limits on the kernel lengthscales and noise (final element) hyperparameters

Type:

list

standardise_training

Choose whether to standardise the training data before GP fit

Type:

bool

standardise_response

Choose whether to standardise the response data before GP fit

Type:

bool

limit_highest_data

Specifies if we should limit the largest response value. Useful in molecular applications where the steep repuslive wall gives huge values

Type:

logical

matern_nu

The nu parameter of the Matern kernel

Type:

float

gpr

The sklearn gaussian process object

Type:

class

add_data(new_training: NDArray[Any, Any], new_response: NDArray[Any, Any]) None

Add data to the model data, accounting for standardisation

function(position: NDArray[Any, Any]) float

Return the mean of the GP fit at position

function_and_std(position: NDArray[Any, Any]) float

Return the mean and variance of the GP fit at position

get_score() float

Get the R^2 score of the gp fit

initialise_gaussian_process(n_restarts: int = 50) None

Initialise the Gaussian process from sklearn Returns a gpr object as an attribute of this class

initialise_kernel() None

Create a specified kernel for use in a Gaussian process

lowest_point() float

Find the lowest point in the current dataset

prepare_training_data()

Modify the training data that is provided to the Gaussian process to normalise training, response and limit their values

refit_model(n_restarts: int = 50) None

Refit the GP model based on the current model_data

update_bounds(scaling: float) None

Change the lengthscale bounds for kernel

write_fit() None

Write the hyperparameters of the best GP fit