QProfiler Configuration Guide#
This guide explains how to configure QProfiler using YAML configuration files for reproducible experiments and batch processing.
Overview#
QProfiler uses YAML configuration files to define:
Input datasets and output directories
Machine learning models to evaluate
Quantum backend settings
Embedding methods and parameters
Train/test split configuration
Model hyperparameters
An example configuration file can be found at apps/qprofiler/configs/config.yaml.
Quick Start#
Here’s a minimal configuration to get started:
# Basic configuration
config_file_name: 'my_experiment'
folder_path: 'data/'
file_dataset: 'my_dataset.csv'
seed: 42
# Models to evaluate
model: ['rf', 'svc', 'qsvc']
# Embedding
embeddings: ['none']
n_components: 3
# Train/test split
test_size: 0.2
stratify: ['y']
scaling: ['True']
# Quantum backend (for QML models)
backend: 'simulator'
shots: 1024
Configuration Sections#
Input Data#
Specify the location and selection of input datasets.
Single Dataset:
config_file_name: 'experiment_name'
folder_path: 'data/'
file_dataset: 'dataset.csv'
All Datasets in Folder:
folder_path: 'data/'
file_dataset: 'ALL' # Process all CSV files in folder
Multiple Specific Datasets:
file_dataset: ['dataset1.csv', 'dataset2.csv', 'dataset3.csv']
Output Directory:
output_dir: 'results/' # Where to save results
Random Seeds#
Set random seeds for reproducibility:
seed: 42 # Seed for classical ML algorithms
q_seed: 42 # Seed for quantum algorithms
Tip
Always set seeds for reproducible experiments. Use the same seed across runs to compare results.
Quantum Backend Configuration#
Configure quantum computing backend for QML models (QSVC, VQC, QNN, PQK).
Simulator (Default):
backend: 'simulator'
shots: 1024
IBM Quantum Hardware:
backend: 'ibm_least' # Use least busy device
# OR
backend: 'ibm_kyoto' # Specific device name
shots: 4096
resil_level: 1 # Error mitigation level (1-3)
IBM Quantum Credentials:
qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'account_qbc' # Account alias in JSON
ibm_instance: 'ibm-q/open/main' # Optional: specific instance
Note
Backend Options:
'simulator': Local Qiskit Aer simulator'ibm_least': Automatically select least busy IBM Quantum device'ibm_<device_name>': Specific IBM Quantum device (e.g., ‘ibm_kyoto’)
Shots: Number of circuit executions. Higher = more accurate but slower.
Resilience Level: Error mitigation strength (1=light, 2=medium, 3=heavy). Higher = more accurate but slower.
Embedding Methods#
Dimensionality reduction techniques to apply before model training.
No Embedding:
embeddings: ['none']
Single Embedding Method:
embeddings: ['pca']
n_components: 3 # Reduce to 3 dimensions
Multiple Embedding Methods:
embeddings: ['pca', 'nmf', 'umap', 'autoencoder']
n_components: 5
Available Embedding Methods:
'none': No dimensionality reduction'pca': Principal Component Analysis'nmf': Non-negative Matrix Factorization'umap': Uniform Manifold Approximation and Projection'autoencoder': Neural network autoencoder
Tip
Start with 'none' to establish baseline performance, then try 'pca' for faster quantum model training.
Train/Test Split#
Configure data splitting and preprocessing.
test_size: 0.2 # 80% train, 20% test
stratify: ['y'] # Maintain class distribution
scaling: ['True'] # Standardize features
Parameters:
test_size: Proportion of data for testing (0.0-1.0)stratify:['y']to maintain class balance,['n']for random splitscaling:['True']to standardize features (recommended),['False']for raw data
Warning
Always use stratify: ['y'] for imbalanced datasets to ensure both train and test sets have representative class distributions.
Model Selection#
Specify which machine learning models to evaluate.
All Models:
model: ['svc', 'dt', 'lr', 'nb', 'rf', 'mlp', 'xgb', 'qsvc', 'vqc', 'qnn', 'pqk']
Classical Models Only:
model: ['rf', 'svc', 'lr', 'mlp', 'xgb']
Quantum Models Only:
model: ['qsvc', 'vqc', 'qnn', 'pqk']
Available Models:
Model |
Type |
Description |
|---|---|---|
|
Classical |
Support Vector Classifier |
|
Classical |
Decision Tree |
|
Classical |
Logistic Regression |
|
Classical |
Naive Bayes |
|
Classical |
Random Forest |
|
Classical |
Multi-Layer Perceptron |
|
Classical |
XGBoost |
|
Quantum |
Quantum Support Vector Classifier |
|
Quantum |
Variational Quantum Classifier |
|
Quantum |
Quantum Neural Network |
|
Quantum |
Projected Quantum Kernel |
Model Hyperparameters#
Configure hyperparameters for each model. Each model has:
Standard arguments: Single values for quick runs
Grid search arguments: Lists of values for hyperparameter tuning
Example: Support Vector Classifier (SVC)
# Standard run with fixed parameters
svc_args:
C: 1.0
gamma: 0.1
kernel: 'rbf'
# Grid search over parameter combinations
gridsearch_svc_args:
C: [0.1, 1, 10, 100]
gamma: [0.001, 0.01, 0.1, 1]
kernel: ['linear', 'rbf', 'poly', 'sigmoid']
Example: Random Forest (RF)
rf_args:
n_estimators: 100
max_depth: 10
min_samples_split: 2
gridsearch_rf_args:
n_estimators: [50, 100, 200]
max_depth: [5, 10, 15, 20]
min_samples_split: [2, 5, 10]
Example: XGBoost (XGB)
xgb_args:
n_estimators: 100
learning_rate: 0.1
max_depth: 6
gridsearch_xgb_args:
n_estimators: [50, 100, 200]
learning_rate: [0.01, 0.1, 0.3]
max_depth: [3, 6, 9]
See also
For detailed parameter descriptions, see the scikit-learn documentation:
Quantum Model Hyperparameters#
For quantum models, hyperparameter tuning requires generating separate config files for each combination.
Important
QML Grid Search:
Quantum model grid search is handled differently than classical models. Use the generate_experiments.ipynb notebook in archive/tutorial_notebooks/qml_experiment_generators/ to generate individual config files for each parameter combination.
This approach is necessary because:
Quantum jobs are submitted to IBM Quantum queue
Each configuration may take hours to complete
Separate configs allow parallel job submission
Example: Quantum SVC (QSVC)
qsvc_args:
feature_map: 'ZZFeatureMap'
reps: 2
entanglement: 'linear'
Example: Variational Quantum Classifier (VQC)
vqc_args:
feature_map: 'ZZFeatureMap'
ansatz: 'RealAmplitudes'
reps: 3
optimizer: 'COBYLA'
Complete Example Configuration#
Here’s a comprehensive example combining all sections:
# Experiment identification
config_file_name: 'comprehensive_experiment'
# Input data
folder_path: 'datasets/'
file_dataset: ['cancer_data.csv', 'diabetes_data.csv']
output_dir: 'results/experiment_001/'
# Reproducibility
seed: 42
q_seed: 42
# Quantum backend
backend: 'simulator'
shots: 1024
resil_level: 1
qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'my_ibm_account'
# Dimensionality reduction
embeddings: ['none', 'pca']
n_components: 5
# Data splitting
test_size: 0.2
stratify: ['y']
scaling: ['True']
# Models to evaluate
model: ['rf', 'svc', 'mlp', 'xgb', 'qsvc', 'pqk']
# Classical model parameters
rf_args:
n_estimators: 100
max_depth: 10
gridsearch_rf_args:
n_estimators: [50, 100, 200]
max_depth: [5, 10, 15]
svc_args:
C: 1.0
kernel: 'rbf'
gridsearch_svc_args:
C: [0.1, 1, 10]
kernel: ['linear', 'rbf']
# Quantum model parameters
qsvc_args:
feature_map: 'ZZFeatureMap'
reps: 2
Best Practices#
Tip
Configuration Tips:
Start Simple: Begin with a minimal config and add complexity gradually
Use Descriptive Names: Name configs by experiment purpose (e.g.,
cancer_baseline.yaml)Version Control: Keep configs in git to track experiment history
Document Changes: Add comments in YAML to explain non-obvious choices
Test Locally First: Use
backend: 'simulator'before submitting to quantum hardware
Warning
Common Pitfalls:
Missing Seeds: Always set
seedandq_seedfor reproducibilityToo Many Grid Search Combinations: Start with small grids to estimate runtime
Troubleshooting#
Problem: “Config file not found”
Ensure config file is in
configs/directoryCheck file name matches
--config-nameargumentUse relative path from project root
Problem: “Invalid backend”
Verify IBM Quantum credentials are configured
Check device name spelling (use
ibm_<device>format)Ensure you have access to the specified instance
Problem: “Grid search taking too long”
Reduce number of parameter combinations
Use fewer cross-validation folds
Consider using
RandomizedSearchCVfor large grids
Problem: “Out of memory”
Reduce
n_componentsfor embeddingsUse smaller
test_sizeto reduce data sizeProcess datasets one at a time instead of batch
See Also#
:doc:
QProfiler Usage Guide <profiler>- How to run QProfiler:doc:
QSage Configuration <sage>- Meta-learning model selection:doc:
Tutorial Notebooks <../tutorials>- Step-by-step examples