QProfiler Configuration Guide#
This guide explains how to configure QProfiler using YAML configuration files for reproducible experiments and batch processing.
Overview#
QProfiler uses YAML configuration files to define:
Input datasets and output directories
Machine learning models to evaluate
Quantum backend settings
Embedding methods and parameters
Train/test split configuration
Model hyperparameters
An example configuration file can be found at apps/qprofiler/configs/config.yaml.
Quick Start#
Here’s a minimal configuration to get started:
# Basic configuration
config_file_name: 'my_experiment'
folder_path: 'data/'
file_dataset: 'my_dataset.csv'
seed: 42
# Models to evaluate
model: ['rf', 'svc', 'qsvc']
# Embedding
embeddings: ['none']
n_components: 3
# Train/test split
test_size: 0.2
stratify: ['y']
scaling: ['True']
# Quantum backend (for QML models)
backend: 'simulator'
shots: 1024
Configuration Sections#
Input Data#
Specify the location and selection of input datasets.
Single Dataset:
config_file_name: 'experiment_name'
folder_path: 'data/'
file_dataset: 'dataset.csv'
All Datasets in Folder:
folder_path: 'data/'
file_dataset: 'ALL' # Process all CSV files in folder
Multiple Specific Datasets:
file_dataset: ['dataset1.csv', 'dataset2.csv', 'dataset3.csv']
Output Directory:
output_dir: 'results/' # Where to save results
Random Seeds#
Set random seeds for reproducibility:
seed: 42 # Seed for classical ML algorithms
q_seed: 42 # Seed for quantum algorithms
Tip
Always set seeds for reproducible experiments. Use the same seed across runs to compare results.
Quantum Backend Configuration#
Configure quantum computing backend for QML models (QSVC, VQC, QNN, PQK).
Simulator (Default):
backend: 'simulator'
shots: 1024
Uses the Qiskit statevector simulator for exact, noiseless quantum simulation.
AerSimulator with Custom Simulation Method:
backend: 'simulator_aer'
sim_method: 'statevector' # Options: statevector, matrix_product_state, tensor_network, etc.
shots: 1024
Provides access to AerSimulator’s various simulation methods. Useful for:
GPU acceleration: Use
sim_method: 'tensor_network'with GPU supportMemory efficiency: Use
sim_method: 'matrix_product_state'for low-entanglement circuitsClifford circuits: Use
sim_method: 'stabilizer'for fast simulation
Noisy Simulation Based on IBM Device:
backend: 'noisy_ibm_cleveland' # Noisy simulation modeled on IBM device
sim_method: 'matrix_product_state' # Simulation method for AerSimulator
shots: 1024
Simulates quantum circuits with realistic noise models from actual IBM Quantum devices. This feature:
Extracts the noise model from a specified IBM device (e.g.,
ibm_cleveland,ibm_kyoto)Runs simulation locally using AerSimulator with the device’s noise characteristics
Allows testing quantum algorithms under realistic noise conditions without queue time
Supports any simulation method available in AerSimulator
Format: 'noisy_<device_name>' where <device_name> is any IBM Quantum device name.
Examples:
'noisy_ibm_cleveland'- Noise model from IBM Cleveland'noisy_ibm_kyoto'- Noise model from IBM Kyoto'noisy_ibm_sherbrooke'- Noise model from IBM Sherbrooke
Tip
Choosing a Simulation Method for Noisy Simulations:
matrix_product_state: Recommended for most quantum machine learning circuits. Efficient for circuits with moderate entanglement.tensor_network: Best for GPU-accelerated simulations. Requiresqiskit-aer-gpuinstallation.statevector: Most accurate but memory-intensive. Limited to ~20-25 qubits depending on available RAM.automatic: Let AerSimulator choose the best method based on circuit properties.
IBM Quantum Hardware:
backend: 'ibm_least' # Use least busy device
# OR
backend: 'ibm_kyoto' # Specific device name
shots: 4096
resil_level: 1 # Error mitigation level (1-3)
Runs circuits on actual IBM Quantum hardware.
IBM Quantum Credentials:
qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'account_qbc' # Account alias in JSON
ibm_instance: 'ibm-q/open/main' # Optional: specific instance
Important
IBM Credentials Required for Noisy Simulations:
When using noisy_<device_name> backends, you must provide valid IBM Quantum credentials even though the simulation runs locally. This is because QBioCode needs to:
Connect to IBM Quantum services to retrieve the device’s noise model
Download the latest calibration data for accurate noise simulation
The actual circuit execution happens locally on your machine using AerSimulator, so you won’t incur queue wait times or consume IBM Quantum compute credits.
Note
Backend Options:
'simulator': Local Qiskit statevector simulator (exact, noiseless)'simulator_aer': AerSimulator with configurable simulation method'noisy_<device_name>': Noisy simulation based on IBM device noise model (e.g.,'noisy_ibm_cleveland')'ibm_least': Automatically select least busy IBM Quantum device'ibm_<device_name>': Specific IBM Quantum device (e.g., ‘ibm_kyoto’)
Simulation Methods (for AerSimulator):
When using 'simulator_aer' or 'noisy_<device_name>' backends, specify the simulation method via sim_method:
'statevector': Exact statevector simulation (default, memory intensive)'matrix_product_state': Efficient for low-entanglement circuits'tensor_network': GPU-accelerated tensor network simulation'stabilizer': Fast simulation for Clifford circuits'extended_stabilizer': Approximate simulation for near-Clifford circuits'automatic': Automatically select best method
Shots: Number of circuit executions. Higher = more accurate but slower.
Resilience Level: Error mitigation strength (1=light, 2=medium, 3=heavy). Higher = more accurate but slower.
Embedding Methods#
Dimensionality reduction techniques to apply before model training.
No Embedding:
embeddings: ['none']
Single Embedding Method:
embeddings: ['pca']
n_components: 3 # Reduce to 3 dimensions
Multiple Embedding Methods:
embeddings: ['pca', 'nmf', 'umap', 'autoencoder']
n_components: 5
Available Embedding Methods:
'none': No dimensionality reduction'pca': Principal Component Analysis'nmf': Non-negative Matrix Factorization'umap': Uniform Manifold Approximation and Projection'autoencoder': Neural network autoencoder
Tip
Start with 'none' to establish baseline performance, then try 'pca' for faster quantum model training.
Train/Test Split#
Configure data splitting and preprocessing.
test_size: 0.2 # 80% train, 20% test
stratify: ['y'] # Maintain class distribution
scaling: ['True'] # Standardize features
Parameters:
test_size: Proportion of data for testing (0.0-1.0)stratify:['y']to maintain class balance,['n']for random splitscaling:['True']to standardize features (recommended),['False']for raw data
Warning
Always use stratify: ['y'] for imbalanced datasets to ensure both train and test sets have representative class distributions.
Model Selection#
Specify which machine learning models to evaluate.
All Models:
model: ['svc', 'dt', 'lr', 'nb', 'rf', 'mlp', 'xgb', 'qsvc', 'vqc', 'qnn', 'pqk']
Classical Models Only:
model: ['rf', 'svc', 'lr', 'mlp', 'xgb']
Quantum Models Only:
model: ['qsvc', 'vqc', 'qnn', 'pqk']
Available Models:
Model |
Type |
Description |
|---|---|---|
|
Classical |
Support Vector Classifier |
|
Classical |
Decision Tree |
|
Classical |
Logistic Regression |
|
Classical |
Naive Bayes |
|
Classical |
Random Forest |
|
Classical |
Multi-Layer Perceptron |
|
Classical |
XGBoost |
|
Quantum |
Quantum Support Vector Classifier |
|
Quantum |
Variational Quantum Classifier |
|
Quantum |
Quantum Neural Network |
|
Quantum |
Projected Quantum Kernel |
Model Hyperparameters#
Configure hyperparameters for each model. Each model has:
Standard arguments: Single values for quick runs
Grid search arguments: Lists of values for hyperparameter tuning
Example: Support Vector Classifier (SVC)
# Standard run with fixed parameters
svc_args:
C: 1.0
gamma: 0.1
kernel: 'rbf'
# Grid search over parameter combinations
gridsearch_svc_args:
C: [0.1, 1, 10, 100]
gamma: [0.001, 0.01, 0.1, 1]
kernel: ['linear', 'rbf', 'poly', 'sigmoid']
Example: Random Forest (RF)
rf_args:
n_estimators: 100
max_depth: 10
min_samples_split: 2
gridsearch_rf_args:
n_estimators: [50, 100, 200]
max_depth: [5, 10, 15, 20]
min_samples_split: [2, 5, 10]
Example: XGBoost (XGB)
xgb_args:
n_estimators: 100
learning_rate: 0.1
max_depth: 6
gridsearch_xgb_args:
n_estimators: [50, 100, 200]
learning_rate: [0.01, 0.1, 0.3]
max_depth: [3, 6, 9]
See also
For detailed parameter descriptions, see the scikit-learn documentation:
Quantum Model Hyperparameters#
For quantum models, hyperparameter tuning requires generating separate config files for each combination.
Important
QML Grid Search:
Quantum model grid search is handled differently than classical models. Use the generate_experiments.ipynb notebook in archive/tutorial_notebooks/qml_experiment_generators/ to generate individual config files for each parameter combination.
This approach is necessary because:
Quantum jobs are submitted to IBM Quantum queue
Each configuration may take hours to complete
Separate configs allow parallel job submission
Example: Quantum SVC (QSVC)
qsvc_args:
feature_map: 'ZZFeatureMap'
reps: 2
entanglement: 'linear'
Example: Variational Quantum Classifier (VQC)
vqc_args:
feature_map: 'ZZFeatureMap'
ansatz: 'RealAmplitudes'
reps: 3
optimizer: 'COBYLA'
Complete Example Configuration#
Here’s a comprehensive example combining all sections:
# Experiment identification
config_file_name: 'comprehensive_experiment'
# Input data
folder_path: 'datasets/'
file_dataset: ['cancer_data.csv', 'diabetes_data.csv']
output_dir: 'results/experiment_001/'
# Reproducibility
seed: 42
q_seed: 42
# Quantum backend - Noisy simulation example
backend: 'noisy_ibm_cleveland'
sim_method: 'matrix_product_state'
shots: 1024
resil_level: 1
qiskit_json_path: '~/.qiskit/qiskit-ibm.json'
name: 'my_ibm_account'
# Dimensionality reduction
embeddings: ['none', 'pca']
n_components: 5
# Data splitting
test_size: 0.2
stratify: ['y']
scaling: ['True']
# Models to evaluate
model: ['rf', 'svc', 'mlp', 'xgb', 'qsvc', 'pqk']
# Classical model parameters
rf_args:
n_estimators: 100
max_depth: 10
gridsearch_rf_args:
n_estimators: [50, 100, 200]
max_depth: [5, 10, 15]
svc_args:
C: 1.0
kernel: 'rbf'
gridsearch_svc_args:
C: [0.1, 1, 10]
kernel: ['linear', 'rbf']
# Quantum model parameters
qsvc_args:
feature_map: 'ZZFeatureMap'
reps: 2
Alternative Backend Configurations:
# For exact noiseless simulation
backend: 'simulator'
# For AerSimulator with GPU acceleration
backend: 'simulator_aer'
sim_method: 'tensor_network'
# For actual IBM Quantum hardware
backend: 'ibm_kyoto'
shots: 4096
resil_level: 2
Best Practices#
Tip
Configuration Tips:
Start Simple: Begin with a minimal config and add complexity gradually
Use Descriptive Names: Name configs by experiment purpose (e.g.,
cancer_baseline.yaml)Version Control: Keep configs in git to track experiment history
Document Changes: Add comments in YAML to explain non-obvious choices
Test Locally First: Use
backend: 'simulator'before submitting to quantum hardware
Warning
Common Pitfalls:
Missing Seeds: Always set
seedandq_seedfor reproducibilityToo Many Grid Search Combinations: Start with small grids to estimate runtime
Troubleshooting#
Problem: “Config file not found”
Ensure config file is in
configs/directoryCheck file name matches
--config-nameargumentUse relative path from project root
Problem: “Invalid backend”
Verify IBM Quantum credentials are configured
Check device name spelling (use
ibm_<device>format)Ensure you have access to the specified instance
Problem: “Grid search taking too long”
Reduce number of parameter combinations
Use fewer cross-validation folds
Consider using
RandomizedSearchCVfor large grids
Problem: “Out of memory”
Reduce
n_componentsfor embeddingsUse smaller
test_sizeto reduce data sizeProcess datasets one at a time instead of batch
See Also#
:doc:
QProfiler Usage Guide <profiler>- How to run QProfiler:doc:
QSage Configuration <sage>- Meta-learning model selection:doc:
Tutorial Notebooks <../tutorials>- Step-by-step examples