qbiocode.learning.compute_nb module#

Summary#

Functions:

`compute_nb`	This function generates a model using a Random Forest (rf) Classifier method as implemented in scikit-learn (https://scikit-learn.org/1.5/modules/generated/sklearn.naive_bayes.GaussianNB.html).
`compute_nb_opt`	This function generates a model using a Random Forest (rf) Classifier method as implemented in scikit-learn (https://scikit-learn.org/1.5/modules/generated/sklearn.naive_bayes.GaussianNB.html).

Reference#

compute_nb(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', data_key='', var_smoothing=1e-09)[source]#

Parameters:

X_train (numpy.ndarray) – Training features.
X_test (numpy.ndarray) – Test features.
y_train (numpy.ndarray) – Training labels.
y_test (numpy.ndarray) – Test labels.
args (dict) – Additional arguments, such as config parameters.
verbose (bool) – If True, prints additional information during execution.
model (str) – Name of the model being used.
data_key (str) – Key for the dataset, if applicable.
var_smoothing (float) – Portion of the largest variance of all features added to variances for calculation stability.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,: and the time taken to train and validate the model, along with the model parameters.

Return type:

modeleval (dict)

compute_nb_opt(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', cv=5, var_smoothing=[1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01])[source]#

This function generates a model using a Random Forest (rf) Classifier method as implemented in scikit-learn (https://scikit-learn.org/1.5/modules/generated/sklearn.naive_bayes.GaussianNB.html). It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points. :type X_train: :param X_train: Training features. :type X_train: numpy.ndarray :type X_test: :param X_test: Test features. :type X_test: numpy.ndarray :type y_train: :param y_train: Training labels. :type y_train: numpy.ndarray :type y_test: :param y_test: Test labels. :type y_test: numpy.ndarray :type args: :param args: Additional arguments, such as config parameters. :type args: dict :type verbose: :param verbose: If True, prints additional information during execution. :type verbose: bool :type model: :param model: Name of the model being used. :type model: str :type cv: :param cv: Number of cross-validation folds for grid search. :type cv: int :type var_smoothing: :param var_smoothing: List of values for the var_smoothing parameter to be tested in grid search. :type var_smoothing: list

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,: and the time taken to train and validate the model, along with the best parameters found during grid search.

Return type:

modeleval (dict)