qbiocode.learning package#

Submodules#

qbiocode.learning.compute_dt module#

compute_dt(X_train, X_test, y_train, y_test, args, verbose=False, model='Decision Tree', data_key='', criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None)[source]#

This function generates a model using a Decision Tree (DT) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from config.yaml.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Decision Tree’.

  • data_key (str) – Key for the dataset, if applicable.

  • criterion (str) – The function to measure the quality of a split. Default is ‘gini’.

  • splitter (str) – The strategy used to choose the split at each node. Default is ‘best’.

  • max_depth (int or None) – The maximum depth of the tree. Default is None.

  • min_samples_split (int) – The minimum number of samples required to split an internal node. Default is 2.

  • min_samples_leaf (int) – The minimum number of samples required to be at a leaf node. Default is 1.

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights required to be at a leaf node. Default is 0.0.

  • max_features (int, float, str or None) – The number of features to consider when looking for the best split. Default is None.

  • random_state (int or None) – Controls the randomness of the estimator. Default is None.

  • max_leaf_nodes (int or None) – Grow a tree with max_leaf_nodes in best-first fashion. Default is None.

  • min_impurity_decrease (float) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.0.

  • class_weight (dict or 'balanced' or None) – Weights associated with classes in the form {class_label: weight}. Default is None.

  • ccp_alpha (float) – Complexity parameter used for Minimal Cost-Complexity Pruning. Default is 0.0.

  • monotonic_cst – Monotonic constraints for tree nodes, if applicable. Default is None.

Returns:

A dictionary containing the evaluation metrics, model parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_dt_opt(X_train, X_test, y_train, y_test, args, verbose=False, model='Decision Tree', cv=5, criterion=[], max_depth=[], min_samples_split=[], min_samples_leaf=[], max_features=[])[source]#

This function also generates a model using a Decision Tree (DT) Classifier method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from config.yaml.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Decision Tree’.

  • cv (int) – Number of cross-validation folds. Default is 5.

  • criterion (list) – List of criteria to consider for splitting. Default is empty list.

  • max_depth (list) – List of maximum depths to consider. Default is empty list.

  • min_samples_split (list) – List of minimum samples required to split an internal node. Default is empty list.

  • min_samples_leaf (list) – List of minimum samples required to be at a leaf node. Default is empty list.

  • max_features (list) – List of maximum features to consider when looking for the best split. Default is empty list.

Returns:

A dictionary containing the evaluation metrics, best parameters, and time taken for training and validation.

Return type:

modeleval (dict)

qbiocode.learning.compute_lr module#

compute_lr(X_train, X_test, y_train, y_test, args, model='Logistic Regression', data_key='', penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='saga', max_iter=10000, multi_class='deprecated', verbose=False, warm_start=False, n_jobs=None, l1_ratio=None)[source]#

This function generates a model using a Logistic Regression (LR) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training data features.

  • X_test (numpy.ndarray) – Test data features.

  • y_train (numpy.ndarray) – Training data labels.

  • y_test (numpy.ndarray) – Test data labels.

  • args (dict) – Additional arguments, such as dataset name and other configurations.

  • model (str) – Name of the model being used, default is ‘Logistic Regression’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • penalty (str) – Regularization penalty, default is ‘l2’.

  • dual (bool) – Dual formulation, default is False.

  • tol (float) – Tolerance for stopping criteria, default is 0.0001.

  • C (float) – Inverse of regularization strength, default is 1.0.

  • fit_intercept (bool) – Whether to fit the intercept, default is True.

  • intercept_scaling (float) – Scaling factor for the intercept, default is 1.

  • class_weight (dict or None) – Weights associated with classes, default is None.

  • random_state (int or None) – Random seed for reproducibility, default is None.

  • solver (str) – Algorithm to use in the optimization problem, default is ‘saga’.

  • max_iter (int) – Maximum number of iterations for convergence, default is 10000.

  • multi_class (str) – Multi-class option, deprecated in this context.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • warm_start (bool) – Whether to reuse the solution of the previous call to fit as initialization, default is False.

  • n_jobs (int or None) – Number of jobs to run in parallel for both fit and predict, default is None which means 1 unless in a joblib.parallel_backend context.

  • l1_ratio (float or None) – The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’, default is None.

Returns:

A dictionary containing the evaluation metrics, model parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_lr_opt(X_train, X_test, y_train, y_test, args, model='Logistic Regression', cv=5, penalty=[], C=[], solver=[], verbose=False, max_iter=[])[source]#

This function also generates a model using a Logistic Regression (LR) method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training data features.

  • X_test (numpy.ndarray) – Test data features.

  • y_train (numpy.ndarray) – Training data labels.

  • y_test (numpy.ndarray) – Test data labels.

  • args (dict) – Additional arguments, such as dataset name and other configurations.

  • model (str) – Name of the model being used, default is ‘Logistic Regression’.

  • cv (int) – Number of cross-validation folds, default is 5.

  • penalty (list) – List of penalties to try, default is an empty list.

  • C (list) – List of inverse regularization strengths to try, default is an empty list.

  • solver (list) – List of solvers to try, default is an empty list.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • max_iter (list) – List of maximum iterations to try, default is an empty list.

Returns:

A dictionary containing the evaluation metrics, best parameters, and time taken for training and validation.

Return type:

modeleval (dict)

qbiocode.learning.compute_mlp module#

compute_mlp(X_train, X_test, y_train, y_test, args, verbose=False, model='Multi-layer Perceptron', data_key='', hidden_layer_sizes=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=10000, shuffle=True, random_state=None, tol=0.0001, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)[source]#

This function generates a model using a Multi-layer Perceptron (mlp), a neural network, method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used.

  • data_key (str) – Key for the dataset, if applicable.

  • hidden_layer_sizes (tuple) – The ith element represents the number of neurons in the ith hidden layer.

  • activation (str) – Activation function for the hidden layer.

  • solver (str) – The solver for weight optimization.

  • alpha (float) – L2 penalty (regularization term) parameter.

  • batch_size (int or str) – Size of minibatches for stochastic optimizers.

  • learning_rate (str) – Learning rate schedule for weight updates.

  • learning_rate_init (float) – Initial learning rate used.

  • power_t (float) – The exponent for inverse scaling learning rate.

  • max_iter (int) – Maximum number of iterations.

  • shuffle (bool) – Whether to shuffle samples in each iteration.

  • random_state (int or None) – Random seed for reproducibility.

  • tol (float) – Tolerance for stopping criteria.

  • warm_start (bool) – If True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

  • momentum (float) – Momentum for gradient descent update.

  • nesterovs_momentum (bool) – Whether to use Nesterov’s momentum or not.

  • early_stopping (bool) – Whether to use early stopping to terminate training when validation score is not improving.

  • validation_fraction (float) – Proportion of training data to set aside as validation set for early stopping.

  • beta_1 – Parameters for Adam optimizer.

  • beta_2 – Parameters for Adam optimizer.

  • epsilon – Parameters for Adam optimizer.

  • n_iter_no_change – Number of iterations with no improvement after which training will be stopped.

  • max_fun – Maximum number of function evaluations.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the model parameters.

Return type:

modeleval (dict)

compute_mlp_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='Multi-layer Perceptron', hidden_layer_sizes=[], activation=[], max_iter=[], solver=[], alpha=[], learning_rate=[])[source]#

This function also generates a model using a Multi-layer Perceptron (mlp), a neural network, as implemented in scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html). The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds.

  • model (str) – Name of the model being used.

  • hidden_layer_sizes (tuple or list) – The ith element represents the number of neurons in the ith hidden layer.

  • activation (str or list) – Activation function for the hidden layer.

  • max_iter (int or list) – Maximum number of iterations.

  • solver (str or list) – The solver for weight optimization.

  • alpha (float or list) – L2 penalty (regularization term) parameter.

  • learning_rate (str or list) – Learning rate schedule for weight updates.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the best parameters found during grid search.

Return type:

modeleval (dict)

qbiocode.learning.compute_nb module#

compute_nb(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', data_key='', var_smoothing=1e-09)[source]#

This function generates a model using a Gaussian Naive Bayes (NB) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used.

  • data_key (str) – Key for the dataset, if applicable.

  • var_smoothing (float) – Portion of the largest variance of all features added to variances for calculation stability.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the model parameters.

Return type:

modeleval (dict)

compute_nb_opt(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', cv=5, var_smoothing=[1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01])[source]#

This function generates a model using a Gaussian Naive Bayes (NB) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points. :type X_train: numpy.ndarray :param X_train: Training features. :type X_train: numpy.ndarray :type X_test: numpy.ndarray :param X_test: Test features. :type X_test: numpy.ndarray :type y_train: numpy.ndarray :param y_train: Training labels. :type y_train: numpy.ndarray :type y_test: numpy.ndarray :param y_test: Test labels. :type y_test: numpy.ndarray :type args: dict :param args: Additional arguments, such as config parameters. :type args: dict :type verbose: bool :param verbose: If True, prints additional information during execution. :type verbose: bool :type model: str :param model: Name of the model being used. :type model: str :type cv: int :param cv: Number of cross-validation folds for grid search. :type cv: int :type var_smoothing: list :param var_smoothing: List of values for the var_smoothing parameter to be tested in grid search. :type var_smoothing: list

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the best parameters found during grid search.

Return type:

modeleval (dict)

qbiocode.learning.compute_pqk module#

compute_pqk(X_train, X_test, y_train, y_test, args, model='PQK', data_key='', verbose=False, encoding='Z', primitive='estimator', entanglement='linear', reps=2, classical_models=None)[source]#

This function generates quantum circuits, computes projections of the data onto these circuits, and evaluates the performance of classical machine learning models on the projected data. It uses a feature map to encode the data into quantum states and then measures the expectation values of Pauli operators to obtain the features. The classical models are trained on the projected training data and evaluated on the projected test data. The function returns evaluation metrics and model parameters. This function requires a quantum backend (simulator or real quantum hardware) for execution. It supports various configurations such as encoding methods, entanglement strategies, and repetitions of the feature map. The results are saved to files for training and test projections, which are reused if they already exist to avoid redundant computations. This function is part of the main quantum machine learning pipeline (QProfiler.py) and is intended for use in supervised learning tasks. It leverages quantum computing to enhance feature extraction and classification performance on complex datasets. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (np.ndarray) – Training data features.

  • X_test (np.ndarray) – Test data features.

  • y_train (np.ndarray) – Training data labels.

  • y_test (np.ndarray) – Test data labels.

  • args (dict) – Arguments containing backend and other configurations.

  • model (str) – Model type, default is ‘PQK’.

  • data_key (str) – Key for the dataset, default is ‘’.

  • verbose (bool) – If True, print additional information, default is False.

  • encoding (str) – Encoding method for the quantum circuit, default is ‘Z’.

  • primitive (str) – Primitive type to use, default is ‘estimator’.

  • entanglement (str) – Entanglement strategy, default is ‘linear’.

  • reps (int) – Number of repetitions for the feature map, default is 2.

  • classical_models (list) – List of classical models to train on quantum projections. Options: ‘rf’, ‘mlp’, ‘svc’, ‘lr’, ‘xgb’. Default is [‘rf’, ‘mlp’, ‘svc’, ‘lr’, ‘xgb’].

Returns:

A DataFrame containing evaluation metrics and model parameters for all models.

Return type:

modeleval (pd.DataFrame)

create_lr_model(seed)[source]#
create_mlp_model(seed)[source]#
create_rf_model(seed)[source]#
create_svc_model(seed)[source]#
create_xgb_model(seed)[source]#

qbiocode.learning.compute_qnn module#

compute_qnn(X_train, X_test, y_train, y_test, args, model='QNN', data_key='', primitive='sampler', verbose=False, local_optimizer='COBYLA', maxiter=100, encoding='Z', entanglement='linear', reps=2, ansatz_type='amp')[source]#

This function computes a Quantum Neural Network (QNN) model on the provided training data and evaluates it on the test data. It constructs a QNN circuit with a specified feature map and ansatz, optimizes it using a chosen optimizer, and fits the model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (array-like) – Training feature set.

  • X_test (array-like) – Test feature set.

  • y_train (array-like) – Training labels.

  • y_test (array-like) – Test labels.

  • args (dict) – Dictionary containing configuration parameters for the QNN.

  • model (str, optional) – Model type. Defaults to ‘QNN’.

  • data_key (str, optional) – Key for the dataset. Defaults to ‘’.

  • primitive (Literal['estimator', 'sampler'], optional) – Type of primitive to use. Defaults to ‘sampler’.

  • verbose (bool, optional) – If True, prints additional information. Defaults to False.

  • local_optimizer (Literal['COBYLA', 'L_BFGS_B', 'GradientDescent'], optional) – Optimizer to use. Defaults to ‘COBYLA’.

  • maxiter (int, optional) – Maximum number of iterations for the optimizer. Defaults to 100.

  • encoding (str, optional) – Feature encoding method. Defaults to ‘Z’.

  • entanglement (str, optional) – Entanglement strategy for the circuit. Defaults to ‘linear’.

  • reps (int, optional) – Number of repetitions for the feature map and ansatz. Defaults to 2.

  • ansatz_type (str, optional) – Type of ansatz to use. Defaults to ‘amp’.

Returns:

A dictionary containing the evaluation results, including accuracy, runtime, model parameters, and other relevant metrics.

Return type:

modeleval (dict)

qbiocode.learning.compute_qsvc module#

compute_qsvc(X_train, X_test, y_train, y_test, args, model='QSVC', data_key='', C=1, gamma='scale', pegasos=False, encoding='ZZ', entanglement='linear', primitive='sampler', reps=2, verbose=False, local_optimizer='')[source]#

This function computes a quantum support vector classifier (QSVC) using the Qiskit Machine Learning library. It takes training and testing datasets, along with various parameters to configure the QSVC model. It initializes the quantum feature map, sets up the backend and session, and fits the QSVC model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (np.ndarray) – Training feature set.

  • X_test (np.ndarray) – Testing feature set.

  • y_train (np.ndarray) – Training labels.

  • y_test (np.ndarray) – Testing labels.

  • args (dict) – Dictionary containing arguments for the quantum backend and other settings.

  • model (str) – Model type, default is ‘QSVC’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • C (float) – Regularization parameter for the SVM, default is 1.

  • gamma (str or float) – Kernel coefficient, default is ‘scale’.

  • pegasos (bool) – Whether to use Pegasos QSVC, default is False.

  • encoding (str) – Feature map encoding type, options are ‘ZZ’, ‘Z’, or ‘P’, default is ‘ZZ’.

  • entanglement (str) – Entanglement strategy for the feature map, default is ‘linear’.

  • primitive (str) – Primitive type to use, default is ‘sampler’.

  • reps (int) – Number of repetitions for the feature map, default is 2.

  • verbose (bool) – Whether to print additional information, default is False.

Returns:

A dictionary containing the evaluation results, including accuracy, runtime, model parameters, and other relevant metrics.

Return type:

modeleval (dict)

qbiocode.learning.compute_rf module#

compute_rf(X_train, X_test, y_train, y_test, args, verbose=False, model='Random Forest', data_key='', n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None, monotonic_cst=None)[source]#

This function generates a model using a Random Forest (RF) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • data_key (str) – Key for identifying the dataset, default is an empty string.

  • n_estimators (int) – Number of trees in the forest, default is 100.

  • criterion (str) – The function to measure the quality of a split, default is ‘gini’.

  • max_depth (int or None) – Maximum depth of the tree, default is None.

  • min_samples_split (int) – Minimum number of samples required to split an internal node, default is 2.

  • min_samples_leaf (int) – Minimum number of samples required to be at a leaf node, default is 1.

  • min_weight_fraction_leaf (float) – Minimum weighted fraction of the sum total of weights required to be at a leaf node, default is 0.0.

  • max_features (str or int or float) – The number of features to consider when looking for the best split, default is ‘sqrt’.

  • max_leaf_nodes (int or None) – Grow trees with max_leaf_nodes in best-first fashion, default is None.

  • min_impurity_decrease (float) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value, default is 0.0.

  • bootstrap (bool) – Whether bootstrap samples are used when building trees, default is True.

  • oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization accuracy, default is False.

  • n_jobs (int or None) – Number of jobs to run in parallel for both fit and predict, default is None.

  • random_state (int or None) – Controls the randomness of the estimator, default is None.

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, default is False.

  • class_weight (dict or str or None) – Weights associated with classes in the form {class_label: weight}, default is None.

  • ccp_alpha (float) – Complexity parameter used for Minimal

compute_rf_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='Random Forest', bootstrap=[], max_depth=[], max_features=[], min_samples_leaf=[], min_samples_split=[], n_estimators=[])[source]#

This function also generates a model using a Random Forest (RF) Classifier method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • bootstrap (list) – List of bootstrap options for grid search.

  • max_depth (list) – List of maximum depth options for grid search.

  • max_features (list) – List of maximum features options for grid search.

  • min_samples_leaf (list) – List of minimum samples leaf options for grid search.

  • min_samples_split (list) – List of minimum samples split options for grid search.

  • n_estimators (list) – List of number of estimators options for grid search.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken for training and validation.

Return type:

modeleval (dict)

qbiocode.learning.compute_svc module#

compute_svc(X_train, X_test, y_train, y_test, args, model='SVC', data_key='', C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)[source]#

This function generates a model using a Support Vector Classifier (SVC) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • model (str) – The type of model to use, default is ‘SVC’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • C (float) – Regularization parameter, default is 1.0.

  • kernel (str) – Specifies the kernel type to be used in the algorithm, default is ‘rbf’.

  • degree (int) – Degree of the polynomial kernel function (‘poly’), default is 3.

  • gamma (str or float) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’, default is ‘scale’.

  • coef0 (float) – Independent term in kernel function, default is 0.0.

  • shrinking (bool) – Whether to use the shrinking heuristic, default is True.

  • probability (bool) – Whether to enable probability estimates, default is False.

  • tol (float) – Tolerance for stopping criteria, default is 0.001.

  • cache_size (int) – Size of the kernel cache in MB, default is 200.

  • class_weight (dict or None) – Weights associated with classes, default is None.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • max_iter (int) – Hard limit on iterations within solver, -1 means no limit, default is -1.

  • decision_function_shape (str) – Determines the shape of the decision function, default is ‘ovr’.

  • break_ties (bool) – Whether to break ties in multiclass classification, default is False.

  • random_state (int or None) – Controls the randomness of the estimator, default is None.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken to train and validate the model.

Return type:

modeleval (dict)

compute_svc_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='SVC', C=[], gamma=[], kernel=[])[source]#

This function generates a model using a Support Vector Classifier (SVC) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – The type of model to use, default is ‘SVC’.

  • C (list or float) – Regularization parameter(s), default is an empty list.

  • gamma (list or str) – Kernel coefficient(s) for ‘rbf’, ‘poly’, and ‘sigmoid’, default is an empty list.

  • kernel (list or str) – Specifies the kernel type(s) to be used in the algorithm, default is an empty list.

qbiocode.learning.compute_vqc module#

compute_vqc(X_train, X_test, y_train, y_test, args, verbose=False, model='VQC', data_key='', local_optimizer='COBYLA', maxiter=100, encoding='Z', entanglement='linear', reps=2, primitive='sampler', ansatz_type='amp')[source]#

This function computes a Variational Quantum Classifier (VQC) using the Qiskit Machine Learning library. It takes training and testing datasets, along with various parameters to configure the VQC model. It initializes the quantum feature map, sets up the backend and session, and fits the VQC model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (array-like) – Training feature set.

  • X_test (array-like) – Testing feature set.

  • y_train (array-like) – Training labels.

  • y_test (array-like) – Testing labels.

  • args (dict) – Dictionary containing configuration parameters for the VQC.

  • verbose (bool, optional) – If True, prints additional information. Defaults to False.

  • model (str, optional) – Model type. Defaults to ‘VQC’.

  • data_key (str, optional) – Key for the dataset. Defaults to ‘’.

  • local_optimizer (str, optional) – Local optimizer to use. Defaults to ‘COBYLA’.

  • maxiter (int, optional) – Maximum number of iterations for the optimizer. Defaults to 100.

  • encoding (str, optional) – Feature map encoding type. Defaults to ‘Z’.

  • entanglement (str, optional) – Entanglement strategy. Defaults to ‘linear’.

  • reps (int, optional) – Number of repetitions for the feature map and ansatz. Defaults to 2.

  • primitive (str, optional) – Primitive type (‘sampler’ or ‘estimator’). Defaults to ‘sampler’.

  • ansatz_type (str, optional) – Type of ansatz to use. Defaults to ‘amp’.

Returns:

Evaluation results including accuracy, time taken, and model parameters.

Return type:

dict

qbiocode.learning.compute_xgb module#

compute_xgb(X_train, X_test, y_train, y_test, args, verbose=False, model='xgb', data_key='', n_estimators=100, *, criterion='gini', max_depth=None, subsample=0.5, learning_rate=0.5, colsample_bytree=1, min_child_weight=1)[source]#

This function generates a model using an Extreme Gradient Boositing (xgb) Classifier method as implemented in xgboost. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘XGBoost’.

  • data_key (str) – Key for identifying the dataset, default is an empty string.

  • n_estimators (int) – Number of trees in the forest, default is 100.

  • max_depth (int or None) – Maximum depth of the tree, default is None.

  • subsample (float) – Subsample ratio of the training instances. Default 0.5

  • learning_rate (float) – Step size shrinkage used in update to prevent overfitting. Default is 0.5

  • colsample_bytree (float) – subsample ratio of columns when constructing each tree. Default is 1

  • min_child_weight (int) – Minimum sum of instance weight (hessian) needed in a child. Default is 1

Raises:

ImportError – If XGBoost is not properly installed or configured.

compute_xgb_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='xgb', bootstrap=[], max_depth=[], max_features=[], learning_rate=[], subsample=[], colsample_bytree=[], n_estimators=[], min_child_weight=[])[source]#

This function generates a model using an Extreme Gradient Boositing (xgb) Classifier method as implemented in xgboost. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • bootstrap (list) – List of bootstrap options for grid search.

  • max_depth (list) – List of maximum depth options for grid search.

  • subsample (list) – List of subsample ratio of the training instances options for grid search.

  • learning_rate (list) – List of step size shrinkage used in update to prevent overfitting options for grid search.

  • colsample_bytree (list) – List of subsample ratio of columns when constructing each tree options for grid search.

  • n_estimators (list) – List of number of estimators options for grid search.

  • min_child_weight (list) – List of minimum sum of instance weight (hessian) needed in a childoptions for grid search.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken for training and validation.

Return type:

modeleval (dict)

Raises:

ImportError – If XGBoost is not properly installed or configured.

Module contents#

Machine Learning Module for QBioCode#

This module provides implementations of classical and quantum machine learning algorithms for classification tasks. Each algorithm includes both standard and optimized versions (where applicable) with hyperparameter tuning.

Classical Algorithms#

  • Decision Tree (DT)

  • Logistic Regression (LR)

  • Multi-Layer Perceptron (MLP)

  • Naive Bayes (NB)

  • Random Forest (RF)

  • Support Vector Classifier (SVC)

  • XGBoost (XGB)

Quantum Algorithms#

  • Quantum Neural Network (QNN)

  • Quantum Support Vector Classifier (QSVC)

  • Variational Quantum Classifier (VQC)

  • Projected Quantum Kernel (PQK)

Usage#

>>> from qbiocode.learning import compute_rf, compute_qsvc
>>> # Train classical model
>>> results = compute_rf(X_train, y_train, X_test, y_test)
>>> # Train quantum model
>>> qresults = compute_qsvc(X_train, y_train, X_test, y_test)
compute_dt(X_train, X_test, y_train, y_test, args, verbose=False, model='Decision Tree', data_key='', criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, monotonic_cst=None)[source]#

This function generates a model using a Decision Tree (DT) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from config.yaml.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Decision Tree’.

  • data_key (str) – Key for the dataset, if applicable.

  • criterion (str) – The function to measure the quality of a split. Default is ‘gini’.

  • splitter (str) – The strategy used to choose the split at each node. Default is ‘best’.

  • max_depth (int or None) – The maximum depth of the tree. Default is None.

  • min_samples_split (int) – The minimum number of samples required to split an internal node. Default is 2.

  • min_samples_leaf (int) – The minimum number of samples required to be at a leaf node. Default is 1.

  • min_weight_fraction_leaf (float) – The minimum weighted fraction of the sum total of weights required to be at a leaf node. Default is 0.0.

  • max_features (int, float, str or None) – The number of features to consider when looking for the best split. Default is None.

  • random_state (int or None) – Controls the randomness of the estimator. Default is None.

  • max_leaf_nodes (int or None) – Grow a tree with max_leaf_nodes in best-first fashion. Default is None.

  • min_impurity_decrease (float) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.0.

  • class_weight (dict or 'balanced' or None) – Weights associated with classes in the form {class_label: weight}. Default is None.

  • ccp_alpha (float) – Complexity parameter used for Minimal Cost-Complexity Pruning. Default is 0.0.

  • monotonic_cst – Monotonic constraints for tree nodes, if applicable. Default is None.

Returns:

A dictionary containing the evaluation metrics, model parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_dt_opt(X_train, X_test, y_train, y_test, args, verbose=False, model='Decision Tree', cv=5, criterion=[], max_depth=[], min_samples_split=[], min_samples_leaf=[], max_features=[])[source]#

This function also generates a model using a Decision Tree (DT) Classifier method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from config.yaml.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Decision Tree’.

  • cv (int) – Number of cross-validation folds. Default is 5.

  • criterion (list) – List of criteria to consider for splitting. Default is empty list.

  • max_depth (list) – List of maximum depths to consider. Default is empty list.

  • min_samples_split (list) – List of minimum samples required to split an internal node. Default is empty list.

  • min_samples_leaf (list) – List of minimum samples required to be at a leaf node. Default is empty list.

  • max_features (list) – List of maximum features to consider when looking for the best split. Default is empty list.

Returns:

A dictionary containing the evaluation metrics, best parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_lr(X_train, X_test, y_train, y_test, args, model='Logistic Regression', data_key='', penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='saga', max_iter=10000, multi_class='deprecated', verbose=False, warm_start=False, n_jobs=None, l1_ratio=None)[source]#

This function generates a model using a Logistic Regression (LR) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training data features.

  • X_test (numpy.ndarray) – Test data features.

  • y_train (numpy.ndarray) – Training data labels.

  • y_test (numpy.ndarray) – Test data labels.

  • args (dict) – Additional arguments, such as dataset name and other configurations.

  • model (str) – Name of the model being used, default is ‘Logistic Regression’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • penalty (str) – Regularization penalty, default is ‘l2’.

  • dual (bool) – Dual formulation, default is False.

  • tol (float) – Tolerance for stopping criteria, default is 0.0001.

  • C (float) – Inverse of regularization strength, default is 1.0.

  • fit_intercept (bool) – Whether to fit the intercept, default is True.

  • intercept_scaling (float) – Scaling factor for the intercept, default is 1.

  • class_weight (dict or None) – Weights associated with classes, default is None.

  • random_state (int or None) – Random seed for reproducibility, default is None.

  • solver (str) – Algorithm to use in the optimization problem, default is ‘saga’.

  • max_iter (int) – Maximum number of iterations for convergence, default is 10000.

  • multi_class (str) – Multi-class option, deprecated in this context.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • warm_start (bool) – Whether to reuse the solution of the previous call to fit as initialization, default is False.

  • n_jobs (int or None) – Number of jobs to run in parallel for both fit and predict, default is None which means 1 unless in a joblib.parallel_backend context.

  • l1_ratio (float or None) – The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty=’elasticnet’, default is None.

Returns:

A dictionary containing the evaluation metrics, model parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_lr_opt(X_train, X_test, y_train, y_test, args, model='Logistic Regression', cv=5, penalty=[], C=[], solver=[], verbose=False, max_iter=[])[source]#

This function also generates a model using a Logistic Regression (LR) method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training data features.

  • X_test (numpy.ndarray) – Test data features.

  • y_train (numpy.ndarray) – Training data labels.

  • y_test (numpy.ndarray) – Test data labels.

  • args (dict) – Additional arguments, such as dataset name and other configurations.

  • model (str) – Name of the model being used, default is ‘Logistic Regression’.

  • cv (int) – Number of cross-validation folds, default is 5.

  • penalty (list) – List of penalties to try, default is an empty list.

  • C (list) – List of inverse regularization strengths to try, default is an empty list.

  • solver (list) – List of solvers to try, default is an empty list.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • max_iter (list) – List of maximum iterations to try, default is an empty list.

Returns:

A dictionary containing the evaluation metrics, best parameters, and time taken for training and validation.

Return type:

modeleval (dict)

compute_mlp(X_train, X_test, y_train, y_test, args, verbose=False, model='Multi-layer Perceptron', data_key='', hidden_layer_sizes=(100,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=10000, shuffle=True, random_state=None, tol=0.0001, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)[source]#

This function generates a model using a Multi-layer Perceptron (mlp), a neural network, method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used.

  • data_key (str) – Key for the dataset, if applicable.

  • hidden_layer_sizes (tuple) – The ith element represents the number of neurons in the ith hidden layer.

  • activation (str) – Activation function for the hidden layer.

  • solver (str) – The solver for weight optimization.

  • alpha (float) – L2 penalty (regularization term) parameter.

  • batch_size (int or str) – Size of minibatches for stochastic optimizers.

  • learning_rate (str) – Learning rate schedule for weight updates.

  • learning_rate_init (float) – Initial learning rate used.

  • power_t (float) – The exponent for inverse scaling learning rate.

  • max_iter (int) – Maximum number of iterations.

  • shuffle (bool) – Whether to shuffle samples in each iteration.

  • random_state (int or None) – Random seed for reproducibility.

  • tol (float) – Tolerance for stopping criteria.

  • warm_start (bool) – If True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

  • momentum (float) – Momentum for gradient descent update.

  • nesterovs_momentum (bool) – Whether to use Nesterov’s momentum or not.

  • early_stopping (bool) – Whether to use early stopping to terminate training when validation score is not improving.

  • validation_fraction (float) – Proportion of training data to set aside as validation set for early stopping.

  • beta_1 – Parameters for Adam optimizer.

  • beta_2 – Parameters for Adam optimizer.

  • epsilon – Parameters for Adam optimizer.

  • n_iter_no_change – Number of iterations with no improvement after which training will be stopped.

  • max_fun – Maximum number of function evaluations.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the model parameters.

Return type:

modeleval (dict)

compute_mlp_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='Multi-layer Perceptron', hidden_layer_sizes=[], activation=[], max_iter=[], solver=[], alpha=[], learning_rate=[])[source]#

This function also generates a model using a Multi-layer Perceptron (mlp), a neural network, as implemented in scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html). The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds.

  • model (str) – Name of the model being used.

  • hidden_layer_sizes (tuple or list) – The ith element represents the number of neurons in the ith hidden layer.

  • activation (str or list) – Activation function for the hidden layer.

  • max_iter (int or list) – Maximum number of iterations.

  • solver (str or list) – The solver for weight optimization.

  • alpha (float or list) – L2 penalty (regularization term) parameter.

  • learning_rate (str or list) – Learning rate schedule for weight updates.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the best parameters found during grid search.

Return type:

modeleval (dict)

compute_nb(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', data_key='', var_smoothing=1e-09)[source]#

This function generates a model using a Gaussian Naive Bayes (NB) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (numpy.ndarray) – Training features.

  • X_test (numpy.ndarray) – Test features.

  • y_train (numpy.ndarray) – Training labels.

  • y_test (numpy.ndarray) – Test labels.

  • args (dict) – Additional arguments, such as config parameters.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used.

  • data_key (str) – Key for the dataset, if applicable.

  • var_smoothing (float) – Portion of the largest variance of all features added to variances for calculation stability.

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the model parameters.

Return type:

modeleval (dict)

compute_nb_opt(X_train, X_test, y_train, y_test, args, verbose=False, model='Naive Bayes', cv=5, var_smoothing=[1e-09, 1e-08, 1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01])[source]#

This function generates a model using a Gaussian Naive Bayes (NB) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points. :type X_train: numpy.ndarray :param X_train: Training features. :type X_train: numpy.ndarray :type X_test: numpy.ndarray :param X_test: Test features. :type X_test: numpy.ndarray :type y_train: numpy.ndarray :param y_train: Training labels. :type y_train: numpy.ndarray :type y_test: numpy.ndarray :param y_test: Test labels. :type y_test: numpy.ndarray :type args: dict :param args: Additional arguments, such as config parameters. :type args: dict :type verbose: bool :param verbose: If True, prints additional information during execution. :type verbose: bool :type model: str :param model: Name of the model being used. :type model: str :type cv: int :param cv: Number of cross-validation folds for grid search. :type cv: int :type var_smoothing: list :param var_smoothing: List of values for the var_smoothing parameter to be tested in grid search. :type var_smoothing: list

Returns:

A dictionary containing the evaluation metrics of the model on the test dataset, including accuracy, AUC, F1 score,

and the time taken to train and validate the model, along with the best parameters found during grid search.

Return type:

modeleval (dict)

compute_pqk(X_train, X_test, y_train, y_test, args, model='PQK', data_key='', verbose=False, encoding='Z', primitive='estimator', entanglement='linear', reps=2, classical_models=None)[source]#

This function generates quantum circuits, computes projections of the data onto these circuits, and evaluates the performance of classical machine learning models on the projected data. It uses a feature map to encode the data into quantum states and then measures the expectation values of Pauli operators to obtain the features. The classical models are trained on the projected training data and evaluated on the projected test data. The function returns evaluation metrics and model parameters. This function requires a quantum backend (simulator or real quantum hardware) for execution. It supports various configurations such as encoding methods, entanglement strategies, and repetitions of the feature map. The results are saved to files for training and test projections, which are reused if they already exist to avoid redundant computations. This function is part of the main quantum machine learning pipeline (QProfiler.py) and is intended for use in supervised learning tasks. It leverages quantum computing to enhance feature extraction and classification performance on complex datasets. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (np.ndarray) – Training data features.

  • X_test (np.ndarray) – Test data features.

  • y_train (np.ndarray) – Training data labels.

  • y_test (np.ndarray) – Test data labels.

  • args (dict) – Arguments containing backend and other configurations.

  • model (str) – Model type, default is ‘PQK’.

  • data_key (str) – Key for the dataset, default is ‘’.

  • verbose (bool) – If True, print additional information, default is False.

  • encoding (str) – Encoding method for the quantum circuit, default is ‘Z’.

  • primitive (str) – Primitive type to use, default is ‘estimator’.

  • entanglement (str) – Entanglement strategy, default is ‘linear’.

  • reps (int) – Number of repetitions for the feature map, default is 2.

  • classical_models (list) – List of classical models to train on quantum projections. Options: ‘rf’, ‘mlp’, ‘svc’, ‘lr’, ‘xgb’. Default is [‘rf’, ‘mlp’, ‘svc’, ‘lr’, ‘xgb’].

Returns:

A DataFrame containing evaluation metrics and model parameters for all models.

Return type:

modeleval (pd.DataFrame)

compute_qnn(X_train, X_test, y_train, y_test, args, model='QNN', data_key='', primitive='sampler', verbose=False, local_optimizer='COBYLA', maxiter=100, encoding='Z', entanglement='linear', reps=2, ansatz_type='amp')[source]#

This function computes a Quantum Neural Network (QNN) model on the provided training data and evaluates it on the test data. It constructs a QNN circuit with a specified feature map and ansatz, optimizes it using a chosen optimizer, and fits the model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (array-like) – Training feature set.

  • X_test (array-like) – Test feature set.

  • y_train (array-like) – Training labels.

  • y_test (array-like) – Test labels.

  • args (dict) – Dictionary containing configuration parameters for the QNN.

  • model (str, optional) – Model type. Defaults to ‘QNN’.

  • data_key (str, optional) – Key for the dataset. Defaults to ‘’.

  • primitive (Literal['estimator', 'sampler'], optional) – Type of primitive to use. Defaults to ‘sampler’.

  • verbose (bool, optional) – If True, prints additional information. Defaults to False.

  • local_optimizer (Literal['COBYLA', 'L_BFGS_B', 'GradientDescent'], optional) – Optimizer to use. Defaults to ‘COBYLA’.

  • maxiter (int, optional) – Maximum number of iterations for the optimizer. Defaults to 100.

  • encoding (str, optional) – Feature encoding method. Defaults to ‘Z’.

  • entanglement (str, optional) – Entanglement strategy for the circuit. Defaults to ‘linear’.

  • reps (int, optional) – Number of repetitions for the feature map and ansatz. Defaults to 2.

  • ansatz_type (str, optional) – Type of ansatz to use. Defaults to ‘amp’.

Returns:

A dictionary containing the evaluation results, including accuracy, runtime, model parameters, and other relevant metrics.

Return type:

modeleval (dict)

compute_qsvc(X_train, X_test, y_train, y_test, args, model='QSVC', data_key='', C=1, gamma='scale', pegasos=False, encoding='ZZ', entanglement='linear', primitive='sampler', reps=2, verbose=False, local_optimizer='')[source]#

This function computes a quantum support vector classifier (QSVC) using the Qiskit Machine Learning library. It takes training and testing datasets, along with various parameters to configure the QSVC model. It initializes the quantum feature map, sets up the backend and session, and fits the QSVC model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (np.ndarray) – Training feature set.

  • X_test (np.ndarray) – Testing feature set.

  • y_train (np.ndarray) – Training labels.

  • y_test (np.ndarray) – Testing labels.

  • args (dict) – Dictionary containing arguments for the quantum backend and other settings.

  • model (str) – Model type, default is ‘QSVC’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • C (float) – Regularization parameter for the SVM, default is 1.

  • gamma (str or float) – Kernel coefficient, default is ‘scale’.

  • pegasos (bool) – Whether to use Pegasos QSVC, default is False.

  • encoding (str) – Feature map encoding type, options are ‘ZZ’, ‘Z’, or ‘P’, default is ‘ZZ’.

  • entanglement (str) – Entanglement strategy for the feature map, default is ‘linear’.

  • primitive (str) – Primitive type to use, default is ‘sampler’.

  • reps (int) – Number of repetitions for the feature map, default is 2.

  • verbose (bool) – Whether to print additional information, default is False.

Returns:

A dictionary containing the evaluation results, including accuracy, runtime, model parameters, and other relevant metrics.

Return type:

modeleval (dict)

compute_rf(X_train, X_test, y_train, y_test, args, verbose=False, model='Random Forest', data_key='', n_estimators=100, *, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, warm_start=False, class_weight=None, ccp_alpha=0.0, max_samples=None, monotonic_cst=None)[source]#

This function generates a model using a Random Forest (RF) Classifier method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • data_key (str) – Key for identifying the dataset, default is an empty string.

  • n_estimators (int) – Number of trees in the forest, default is 100.

  • criterion (str) – The function to measure the quality of a split, default is ‘gini’.

  • max_depth (int or None) – Maximum depth of the tree, default is None.

  • min_samples_split (int) – Minimum number of samples required to split an internal node, default is 2.

  • min_samples_leaf (int) – Minimum number of samples required to be at a leaf node, default is 1.

  • min_weight_fraction_leaf (float) – Minimum weighted fraction of the sum total of weights required to be at a leaf node, default is 0.0.

  • max_features (str or int or float) – The number of features to consider when looking for the best split, default is ‘sqrt’.

  • max_leaf_nodes (int or None) – Grow trees with max_leaf_nodes in best-first fashion, default is None.

  • min_impurity_decrease (float) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value, default is 0.0.

  • bootstrap (bool) – Whether bootstrap samples are used when building trees, default is True.

  • oob_score (bool) – Whether to use out-of-bag samples to estimate the generalization accuracy, default is False.

  • n_jobs (int or None) – Number of jobs to run in parallel for both fit and predict, default is None.

  • random_state (int or None) – Controls the randomness of the estimator, default is None.

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, default is False.

  • class_weight (dict or str or None) – Weights associated with classes in the form {class_label: weight}, default is None.

  • ccp_alpha (float) – Complexity parameter used for Minimal

compute_rf_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='Random Forest', bootstrap=[], max_depth=[], max_features=[], min_samples_leaf=[], min_samples_split=[], n_estimators=[])[source]#

This function also generates a model using a Random Forest (RF) Classifier method as implemented in scikit-learn. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • bootstrap (list) – List of bootstrap options for grid search.

  • max_depth (list) – List of maximum depth options for grid search.

  • max_features (list) – List of maximum features options for grid search.

  • min_samples_leaf (list) – List of minimum samples leaf options for grid search.

  • min_samples_split (list) – List of minimum samples split options for grid search.

  • n_estimators (list) – List of number of estimators options for grid search.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken for training and validation.

Return type:

modeleval (dict)

compute_svc(X_train, X_test, y_train, y_test, args, model='SVC', data_key='', C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)[source]#

This function generates a model using a Support Vector Classifier (SVC) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • model (str) – The type of model to use, default is ‘SVC’.

  • data_key (str) – Key for the dataset, default is an empty string.

  • C (float) – Regularization parameter, default is 1.0.

  • kernel (str) – Specifies the kernel type to be used in the algorithm, default is ‘rbf’.

  • degree (int) – Degree of the polynomial kernel function (‘poly’), default is 3.

  • gamma (str or float) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’, default is ‘scale’.

  • coef0 (float) – Independent term in kernel function, default is 0.0.

  • shrinking (bool) – Whether to use the shrinking heuristic, default is True.

  • probability (bool) – Whether to enable probability estimates, default is False.

  • tol (float) – Tolerance for stopping criteria, default is 0.001.

  • cache_size (int) – Size of the kernel cache in MB, default is 200.

  • class_weight (dict or None) – Weights associated with classes, default is None.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • max_iter (int) – Hard limit on iterations within solver, -1 means no limit, default is -1.

  • decision_function_shape (str) – Determines the shape of the decision function, default is ‘ovr’.

  • break_ties (bool) – Whether to break ties in multiclass classification, default is False.

  • random_state (int or None) – Controls the randomness of the estimator, default is None.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken to train and validate the model.

Return type:

modeleval (dict)

compute_svc_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='SVC', C=[], gamma=[], kernel=[])[source]#

This function generates a model using a Support Vector Classifier (SVC) method as implemented in scikit-learn. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – Whether to print detailed logs, default is False.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – The type of model to use, default is ‘SVC’.

  • C (list or float) – Regularization parameter(s), default is an empty list.

  • gamma (list or str) – Kernel coefficient(s) for ‘rbf’, ‘poly’, and ‘sigmoid’, default is an empty list.

  • kernel (list or str) – Specifies the kernel type(s) to be used in the algorithm, default is an empty list.

compute_vqc(X_train, X_test, y_train, y_test, args, verbose=False, model='VQC', data_key='', local_optimizer='COBYLA', maxiter=100, encoding='Z', entanglement='linear', reps=2, primitive='sampler', ansatz_type='amp')[source]#

This function computes a Variational Quantum Classifier (VQC) using the Qiskit Machine Learning library. It takes training and testing datasets, along with various parameters to configure the VQC model. It initializes the quantum feature map, sets up the backend and session, and fits the VQC model to the training data. It then predicts the labels for the test data and evaluates the model’s performance. The function returns the performance results, including accuracy, F1-score, AUC, runtime, as well as model parameters, and other relevant metrics.

Parameters:
  • X_train (array-like) – Training feature set.

  • X_test (array-like) – Testing feature set.

  • y_train (array-like) – Training labels.

  • y_test (array-like) – Testing labels.

  • args (dict) – Dictionary containing configuration parameters for the VQC.

  • verbose (bool, optional) – If True, prints additional information. Defaults to False.

  • model (str, optional) – Model type. Defaults to ‘VQC’.

  • data_key (str, optional) – Key for the dataset. Defaults to ‘’.

  • local_optimizer (str, optional) – Local optimizer to use. Defaults to ‘COBYLA’.

  • maxiter (int, optional) – Maximum number of iterations for the optimizer. Defaults to 100.

  • encoding (str, optional) – Feature map encoding type. Defaults to ‘Z’.

  • entanglement (str, optional) – Entanglement strategy. Defaults to ‘linear’.

  • reps (int, optional) – Number of repetitions for the feature map and ansatz. Defaults to 2.

  • primitive (str, optional) – Primitive type (‘sampler’ or ‘estimator’). Defaults to ‘sampler’.

  • ansatz_type (str, optional) – Type of ansatz to use. Defaults to ‘amp’.

Returns:

Evaluation results including accuracy, time taken, and model parameters.

Return type:

dict

compute_xgb(X_train, X_test, y_train, y_test, args, verbose=False, model='xgb', data_key='', n_estimators=100, *, criterion='gini', max_depth=None, subsample=0.5, learning_rate=0.5, colsample_bytree=1, min_child_weight=1)[source]#

This function generates a model using an Extreme Gradient Boositing (xgb) Classifier method as implemented in xgboost. It takes in parameter arguments specified in the config.yaml file, but will use the default parameters specified above if none are passed. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • model (str) – Name of the model being used, default is ‘XGBoost’.

  • data_key (str) – Key for identifying the dataset, default is an empty string.

  • n_estimators (int) – Number of trees in the forest, default is 100.

  • max_depth (int or None) – Maximum depth of the tree, default is None.

  • subsample (float) – Subsample ratio of the training instances. Default 0.5

  • learning_rate (float) – Step size shrinkage used in update to prevent overfitting. Default is 0.5

  • colsample_bytree (float) – subsample ratio of columns when constructing each tree. Default is 1

  • min_child_weight (int) – Minimum sum of instance weight (hessian) needed in a child. Default is 1

Raises:

ImportError – If XGBoost is not properly installed or configured.

compute_xgb_opt(X_train, X_test, y_train, y_test, args, verbose=False, cv=5, model='xgb', bootstrap=[], max_depth=[], max_features=[], learning_rate=[], subsample=[], colsample_bytree=[], n_estimators=[], min_child_weight=[])[source]#

This function generates a model using an Extreme Gradient Boositing (xgb) Classifier method as implemented in xgboost. The difference here is that this function runs a grid search. The range of the grid search for each parameter is specified in the config.yaml file. The combination of parameters that led to the best performance is saved and returned as best_params, which can then be used on similar datasets, without having to run the grid search. The model is trained on the training dataset and validated on the test dataset. The function returns the evaluation of the model on the test dataset, including accuracy, AUC, F1 score, and the time taken to train and validate the model across the grid search. This function is designed to be used in a supervised learning context, where the goal is to classify data points.

Parameters:
  • X_train (array-like) – Training data features.

  • X_test (array-like) – Test data features.

  • y_train (array-like) – Training data labels.

  • y_test (array-like) – Test data labels.

  • args (dict) – Additional arguments, typically from a configuration file.

  • verbose (bool) – If True, prints additional information during execution.

  • cv (int) – Number of cross-validation folds, default is 5.

  • model (str) – Name of the model being used, default is ‘Random Forest’.

  • bootstrap (list) – List of bootstrap options for grid search.

  • max_depth (list) – List of maximum depth options for grid search.

  • subsample (list) – List of subsample ratio of the training instances options for grid search.

  • learning_rate (list) – List of step size shrinkage used in update to prevent overfitting options for grid search.

  • colsample_bytree (list) – List of subsample ratio of columns when constructing each tree options for grid search.

  • n_estimators (list) – List of number of estimators options for grid search.

  • min_child_weight (list) – List of minimum sum of instance weight (hessian) needed in a childoptions for grid search.

Returns:

A dictionary containing the evaluation metrics of the model, including accuracy, AUC, F1 score, and the time taken for training and validation.

Return type:

modeleval (dict)

Raises:

ImportError – If XGBoost is not properly installed or configured.