QProfiler Configuration Guide#

This guide explains how to configure QProfiler using YAML configuration files for reproducible experiments and batch processing.

Overview#

QProfiler uses YAML configuration files to define:

  • Input datasets and output directories

  • Machine learning models to evaluate

  • Quantum backend settings

  • Embedding methods and parameters

  • Train/test split configuration

  • Model hyperparameters

An example configuration file can be found at apps/qprofiler/configs/config.yaml.

Quick Start#

Here’s a minimal configuration to get started:

# Basic configuration
config_file_name: 'my_experiment'
folder_path: 'data/'
file_dataset: 'my_dataset.csv'
seed: 42

# Models to evaluate
model: ['rf', 'svc', 'qsvc']

# Embedding
embeddings: ['none']
n_components: 3

# Train/test split
test_size: 0.2
stratify: ['y']
scaling: ['True']

# Quantum backend (for QML models)
backend: 'simulator'
shots: 1024

Configuration Sections#

Input Data#

Specify the location and selection of input datasets.

Single Dataset:

config_file_name: 'experiment_name'
folder_path: 'data/'
file_dataset: 'dataset.csv'

All Datasets in Folder:

folder_path: 'data/'
file_dataset: 'ALL'  # Process all CSV files in folder

Multiple Specific Datasets:

file_dataset: ['dataset1.csv', 'dataset2.csv', 'dataset3.csv']

Output Directory:

output_dir: 'results/'  # Where to save results

Random Seeds#

Set random seeds for reproducibility:

seed: 42      # Seed for classical ML algorithms
q_seed: 42    # Seed for quantum algorithms

Tip

Always set seeds for reproducible experiments. Use the same seed across runs to compare results.

Quantum Backend Configuration#

Configure quantum computing backend for QML models (QSVC, VQC, QNN, PQK).

Simulator (Default):

backend: 'simulator'
shots: 1024

IBM Quantum Hardware:

backend: 'ibm_least'  # Use least busy device
# OR
backend: 'ibm_kyoto'  # Specific device name
shots: 4096
resil_level: 1  # Error mitigation level (1-3)

IBM Quantum Credentials:

qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'account_qbc'  # Account alias in JSON
ibm_instance: 'ibm-q/open/main'  # Optional: specific instance

Note

Backend Options:

  • 'simulator': Local Qiskit Aer simulator

  • 'ibm_least': Automatically select least busy IBM Quantum device

  • 'ibm_<device_name>': Specific IBM Quantum device (e.g., ‘ibm_kyoto’)

Shots: Number of circuit executions. Higher = more accurate but slower.

Resilience Level: Error mitigation strength (1=light, 2=medium, 3=heavy). Higher = more accurate but slower.

Embedding Methods#

Dimensionality reduction techniques to apply before model training.

No Embedding:

embeddings: ['none']

Single Embedding Method:

embeddings: ['pca']
n_components: 3  # Reduce to 3 dimensions

Multiple Embedding Methods:

embeddings: ['pca', 'nmf', 'umap', 'autoencoder']
n_components: 5

Available Embedding Methods:

  • 'none': No dimensionality reduction

  • 'pca': Principal Component Analysis

  • 'nmf': Non-negative Matrix Factorization

  • 'umap': Uniform Manifold Approximation and Projection

  • 'autoencoder': Neural network autoencoder

Tip

Start with 'none' to establish baseline performance, then try 'pca' for faster quantum model training.

Train/Test Split#

Configure data splitting and preprocessing.

test_size: 0.2      # 80% train, 20% test
stratify: ['y']     # Maintain class distribution
scaling: ['True']   # Standardize features

Parameters:

  • test_size: Proportion of data for testing (0.0-1.0)

  • stratify: ['y'] to maintain class balance, ['n'] for random split

  • scaling: ['True'] to standardize features (recommended), ['False'] for raw data

Warning

Always use stratify: ['y'] for imbalanced datasets to ensure both train and test sets have representative class distributions.

Model Selection#

Specify which machine learning models to evaluate.

All Models:

model: ['svc', 'dt', 'lr', 'nb', 'rf', 'mlp', 'xgb', 'qsvc', 'vqc', 'qnn', 'pqk']

Classical Models Only:

model: ['rf', 'svc', 'lr', 'mlp', 'xgb']

Quantum Models Only:

model: ['qsvc', 'vqc', 'qnn', 'pqk']

Available Models:

Model

Type

Description

svc

Classical

Support Vector Classifier

dt

Classical

Decision Tree

lr

Classical

Logistic Regression

nb

Classical

Naive Bayes

rf

Classical

Random Forest

mlp

Classical

Multi-Layer Perceptron

xgb

Classical

XGBoost

qsvc

Quantum

Quantum Support Vector Classifier

vqc

Quantum

Variational Quantum Classifier

qnn

Quantum

Quantum Neural Network

pqk

Quantum

Projected Quantum Kernel

Model Hyperparameters#

Configure hyperparameters for each model. Each model has:

  • Standard arguments: Single values for quick runs

  • Grid search arguments: Lists of values for hyperparameter tuning

Example: Support Vector Classifier (SVC)

# Standard run with fixed parameters
svc_args:
  C: 1.0
  gamma: 0.1
  kernel: 'rbf'

# Grid search over parameter combinations
gridsearch_svc_args:
  C: [0.1, 1, 10, 100]
  gamma: [0.001, 0.01, 0.1, 1]
  kernel: ['linear', 'rbf', 'poly', 'sigmoid']

Example: Random Forest (RF)

rf_args:
  n_estimators: 100
  max_depth: 10
  min_samples_split: 2

gridsearch_rf_args:
  n_estimators: [50, 100, 200]
  max_depth: [5, 10, 15, 20]
  min_samples_split: [2, 5, 10]

Example: XGBoost (XGB)

xgb_args:
  n_estimators: 100
  learning_rate: 0.1
  max_depth: 6

gridsearch_xgb_args:
  n_estimators: [50, 100, 200]
  learning_rate: [0.01, 0.1, 0.3]
  max_depth: [3, 6, 9]

See also

For detailed parameter descriptions, see the scikit-learn documentation:

Quantum Model Hyperparameters#

For quantum models, hyperparameter tuning requires generating separate config files for each combination.

Important

QML Grid Search:

Quantum model grid search is handled differently than classical models. Use the generate_experiments.ipynb notebook in archive/tutorial_notebooks/qml_experiment_generators/ to generate individual config files for each parameter combination.

This approach is necessary because:

  1. Quantum jobs are submitted to IBM Quantum queue

  2. Each configuration may take hours to complete

  3. Separate configs allow parallel job submission

Example: Quantum SVC (QSVC)

qsvc_args:
  feature_map: 'ZZFeatureMap'
  reps: 2
  entanglement: 'linear'

Example: Variational Quantum Classifier (VQC)

vqc_args:
  feature_map: 'ZZFeatureMap'
  ansatz: 'RealAmplitudes'
  reps: 3
  optimizer: 'COBYLA'

Complete Example Configuration#

Here’s a comprehensive example combining all sections:

# Experiment identification
config_file_name: 'comprehensive_experiment'

# Input data
folder_path: 'datasets/'
file_dataset: ['cancer_data.csv', 'diabetes_data.csv']
output_dir: 'results/experiment_001/'

# Reproducibility
seed: 42
q_seed: 42

# Quantum backend
backend: 'simulator'
shots: 1024
resil_level: 1
qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'my_ibm_account'

# Dimensionality reduction
embeddings: ['none', 'pca']
n_components: 5

# Data splitting
test_size: 0.2
stratify: ['y']
scaling: ['True']

# Models to evaluate
model: ['rf', 'svc', 'mlp', 'xgb', 'qsvc', 'pqk']

# Classical model parameters
rf_args:
  n_estimators: 100
  max_depth: 10

gridsearch_rf_args:
  n_estimators: [50, 100, 200]
  max_depth: [5, 10, 15]

svc_args:
  C: 1.0
  kernel: 'rbf'

gridsearch_svc_args:
  C: [0.1, 1, 10]
  kernel: ['linear', 'rbf']

# Quantum model parameters
qsvc_args:
  feature_map: 'ZZFeatureMap'
  reps: 2

Best Practices#

Tip

Configuration Tips:

  1. Start Simple: Begin with a minimal config and add complexity gradually

  2. Use Descriptive Names: Name configs by experiment purpose (e.g., cancer_baseline.yaml)

  3. Version Control: Keep configs in git to track experiment history

  4. Document Changes: Add comments in YAML to explain non-obvious choices

  5. Test Locally First: Use backend: 'simulator' before submitting to quantum hardware

Warning

Common Pitfalls:

  • Missing Seeds: Always set seed and q_seed for reproducibility

  • Too Many Grid Search Combinations: Start with small grids to estimate runtime


Troubleshooting#

Problem: “Config file not found”

  • Ensure config file is in configs/ directory

  • Check file name matches --config-name argument

  • Use relative path from project root

Problem: “Invalid backend”

  • Verify IBM Quantum credentials are configured

  • Check device name spelling (use ibm_<device> format)

  • Ensure you have access to the specified instance

Problem: “Grid search taking too long”

  • Reduce number of parameter combinations

  • Use fewer cross-validation folds

  • Consider using RandomizedSearchCV for large grids

Problem: “Out of memory”

  • Reduce n_components for embeddings

  • Use smaller test_size to reduce data size

  • Process datasets one at a time instead of batch


See Also#

  • :doc:QProfiler Usage Guide <profiler> - How to run QProfiler

  • :doc:QSage Configuration <sage> - Meta-learning model selection

  • :doc:Tutorial Notebooks <../tutorials> - Step-by-step examples