Quantum Projection Learning (QPL) Tutorial#
Overview#
Quantum Projection Learning (QPL) is an advanced quantum machine learning technique that extends Projected Quantum Kernels (PQK) by evaluating quantum-projected data with multiple classical machine learning algorithms. Instead of using only Support Vector Machines, QPL systematically compares performance across:
Support Vector Classifier (SVC)
Random Forest (RF)
XGBoost (XGB)
Multi-Layer Perceptron (MLP)
Logistic Regression (LR)
This comprehensive approach helps identify which classical learner best exploits quantum feature representations for a given dataset.
What You’ll Learn#
Generate synthetic classification datasets with controlled complexity
Configure and run QPL experiments using QProfiler
Compare quantum-enhanced vs. classical baseline performance
Analyze results across multiple models and embeddings
Visualize performance metrics and identify quantum advantages
Key Concepts#
Quantum Feature Maps: Transform classical data into quantum states
Quantum Projections: Extract features from quantum circuits via expectation values
Ensemble Learning: Combine multiple classical learners on quantum features
Data Complexity Analysis: Understand which datasets benefit from quantum processing
1. Setup and Imports#
First, configure the environment and import necessary libraries.
[ ]:
%load_ext autoreload
%autoreload 2
import sys
import os
import re
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set up paths
dir_home = re.sub('QBioCode.*', 'QBioCode', os.getcwd())
sys.path.append(dir_home)
sys.path.append(os.path.join(dir_home, 'apps'))
# Import QBioCode
import qbiocode as qbc
import qprofiler.qprofiler as profiler
# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.dpi'] = 100
print("✓ Environment configured successfully")
print(f"✓ Working directory: {os.getcwd()}")
2. Understanding Quantum Projection Learning#
What is QPL?#
Quantum Projection Learning works in three stages:
Quantum Encoding: Classical data is encoded into quantum states using parameterized quantum circuits (feature maps)
Quantum Projection: Expectation values of Pauli operators are measured, creating quantum-derived features
Classical Learning: Multiple classical ML models are trained on these quantum features
Why Multiple Classifiers?#
Different classifiers have different inductive biases:
SVC: Effective for high-dimensional, non-linear boundaries
Random Forest: Robust to noise, captures feature interactions
XGBoost: Excellent for structured data, handles imbalanced classes
MLP: Can learn complex non-linear patterns
Logistic Regression: Simple, interpretable baseline
By testing all of them, we identify which best exploits quantum features for each dataset.
3. Generate Synthetic Test Data#
We’ll create artificial classification datasets with controlled properties to test QPL performance.
[ ]:
# Dataset configuration
type_of_data = 'classes'
save_path = os.path.join('data', 'qpl_tutorial_data')
# Dataset parameters
N_SAMPLES = [100, 200] # Number of samples per dataset
N_FEATURES = [10, 15] # Number of features
N_INFORMATIVE = [5, 8] # Number of informative features
N_REDUNDANT = [2, 3] # Number of redundant features
N_CLASSES = [2] # Binary classification
N_CLUSTERS_PER_CLASS = [2] # Clusters per class
WEIGHTS = [[0.3, 0.7], [0.5, 0.5]] # Class imbalance scenarios
print("Generating synthetic datasets...")
print(f" - Data type: {type_of_data}")
print(f" - Sample sizes: {N_SAMPLES}")
print(f" - Feature dimensions: {N_FEATURES}")
print(f" - Class weights: {WEIGHTS}")
# Generate datasets
qbc.generate_data(
type_of_data=type_of_data,
save_path=save_path,
n_samples=N_SAMPLES,
n_features=N_FEATURES,
n_informative=N_INFORMATIVE,
n_redundant=N_REDUNDANT,
n_classes=N_CLASSES,
n_clusters_per_class=N_CLUSTERS_PER_CLASS,
weights=WEIGHTS,
)
print(f"\n✓ Datasets generated and saved to: {save_path}")
# List generated files
if os.path.exists(save_path):
files = [f for f in os.listdir(save_path) if f.endswith('.csv')]
print(f"✓ Generated {len(files)} dataset files")
for f in files[:5]: # Show first 5
print(f" - {f}")
if len(files) > 5:
print(f" ... and {len(files) - 5} more")
4. Configure QPL Experiment#
QProfiler uses YAML configuration files to specify experimental parameters. Let’s examine the QPL configuration.
[ ]:
# Load and display QPL configuration
qpl_config_path = 'configs/qpl.yaml'
with open(qpl_config_path, 'r') as f:
qpl_config = yaml.safe_load(f)
print("QPL Configuration:")
print("=" * 60)
print(yaml.dump(qpl_config, default_flow_style=False, sort_keys=False))
print("=" * 60)
print("\nKey Configuration Parameters:")
print(f" - Model: {qpl_config.get('model', 'N/A')}")
print(f" - Backend: {qpl_config.get('backend', 'N/A')}")
print(f" - Data directory: {qpl_config.get('data_dir', 'N/A')}")
print(f" - Embeddings: {qpl_config.get('embeddings', 'N/A')}")
if 'pqk' in qpl_config:
print("\n PQK-specific parameters:")
for key, value in qpl_config['pqk'].items():
print(f" - {key}: {value}")
5. Run QPL Experiment#
Execute the QPL experiment using QProfiler. This will:
Load the generated datasets
Apply quantum feature maps
Extract quantum projections
Train all 5 classical models on quantum features
Evaluate performance and save results
[ ]:
print("=" * 80)
print("RUNNING QUANTUM PROJECTION LEARNING EXPERIMENT")
print("=" * 80)
print("\nThis may take several minutes depending on:")
print(" - Number of datasets")
print(" - Dataset size")
print(" - Quantum backend (simulator vs. hardware)")
print(" - Number of models to train\n")
# Run QPL experiment
profiler.main(qpl_config)
print("\n" + "=" * 80)
print("✓ QPL EXPERIMENT COMPLETED")
print("=" * 80)
6. Run Classical Baseline (XGBoost)#
For comparison, we’ll run a classical baseline using XGBoost on the original (non-quantum) features.
[ ]:
print("=" * 80)
print("RUNNING CLASSICAL BASELINE (XGBoost)")
print("=" * 80)
# Load classical baseline configuration
xgb_config_path = 'configs/xgb.yaml'
xgb_config = yaml.safe_load(open(xgb_config_path, 'r'))
print("\nXGBoost Configuration:")
print(f" - Model: {xgb_config.get('model', 'N/A')}")
print(f" - Data directory: {xgb_config.get('data_dir', 'N/A')}")
print(f" - Embeddings: {xgb_config.get('embeddings', 'N/A')}\n")
# Run classical baseline
profiler.main(xgb_config)
print("\n" + "=" * 80)
print("✓ CLASSICAL BASELINE COMPLETED")
print("=" * 80)
7. Load and Compile Results#
Collect all results from the experiments and compile them into comprehensive DataFrames.
[ ]:
print("Loading experimental results...\n")
# Find all result files
data_eval_files = [
os.path.join(dp, f)
for dp, dn, filenames in os.walk(os.getcwd())
for f in filenames
if f == 'RawDataEvaluation.csv'
]
model_result_files = [
os.path.join(dp, f)
for dp, dn, filenames in os.walk(os.getcwd())
for f in filenames
if f == 'ModelResults.csv'
]
print(f"Found {len(data_eval_files)} data evaluation files")
print(f"Found {len(model_result_files)} model result files\n")
# Load and compile data complexity evaluations
if data_eval_files:
rawevals_df = pd.concat([pd.read_csv(f) for f in data_eval_files], ignore_index=True)
rawevals_df.to_csv('compiled_raw_data_evaluations.csv', index=False)
print(f"✓ Data evaluations compiled: {rawevals_df.shape}")
print(f" Saved to: compiled_raw_data_evaluations.csv")
else:
print("⚠ No data evaluation files found")
rawevals_df = pd.DataFrame()
# Load and compile model results
if model_result_files:
results_df = pd.concat([pd.read_csv(f) for f in model_result_files], ignore_index=True)
# Add useful columns for analysis
results_df['datatype'] = results_df['Dataset'].str.replace('-.*', '', regex=True)
results_df['model_embed_datatype'] = (
results_df['model'] + '_' +
results_df['embeddings'] + '_' +
results_df['datatype']
)
results_df['model_datatype'] = (
results_df['model'] + '_' + results_df['datatype']
)
results_df.to_csv('compiled_results.csv', index=False)
print(f"\n✓ Model results compiled: {results_df.shape}")
print(f" Saved to: compiled_results.csv")
# Display summary statistics
print("\nResults Summary:")
print(f" - Unique datasets: {results_df['Dataset'].nunique()}")
print(f" - Models tested: {results_df['model'].unique().tolist()}")
print(f" - Embeddings used: {results_df['embeddings'].unique().tolist()}")
print(f" - Metrics available: {[col for col in results_df.columns if 'score' in col or 'accuracy' in col or 'auc' in col]}")
else:
print("⚠ No model result files found")
results_df = pd.DataFrame()
print("\n" + "=" * 80)
8. Performance Visualization#
Create comprehensive visualizations comparing quantum and classical performance.
[ ]:
if results_df.empty:
print("⚠ No results to visualize. Please run the experiments first.")
else:
# Create output directory
output_dir = 'performance_summary_and_spearman_correlation_plots'
os.makedirs(output_dir, exist_ok=True)
tag = 'qpl_tutorial'
print("Generating performance visualizations...\n")
# Plot performance metrics
metrics = ['f1_score', 'accuracy', 'auc']
for metric in metrics:
if metric not in results_df.columns:
print(f"⚠ Metric '{metric}' not found in results")
continue
plt.figure(figsize=(12, 6))
# Create boxplot
sns.boxplot(
data=results_df,
x='model_datatype',
y=metric,
hue='embeddings',
palette='Set2'
)
plt.ylim(0, 1)
plt.xticks(rotation=45, ha='right')
plt.xlabel('Model & Data Type', fontsize=12)
plt.ylabel(metric.replace('_', ' ').title(), fontsize=12)
plt.title(f'{metric.replace("_", " ").title()} Comparison: Quantum vs Classical',
fontsize=14, fontweight='bold')
plt.legend(title='Embeddings', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
# Save figure
filename = os.path.join(output_dir, f'{tag}_{metric}_boxplot.png')
plt.savefig(filename, dpi=300, bbox_inches='tight')
print(f"✓ Saved: {filename}")
plt.show()
plt.close()
print(f"\n✓ All visualizations saved to: {output_dir}/")
9. Quantum Advantage Analysis#
Identify datasets where quantum methods outperform classical baselines.
[ ]:
if results_df.empty:
print("⚠ No results to analyze")
else:
print("Analyzing Quantum Advantage...\n")
print("=" * 80)
# Define quantum and classical models
qml_models = ['pqk_lr', 'pqk_svc', 'pqk_rf', 'pqk_mlp', 'pqk_xgb']
# Calculate median F1 score across splits for each dataset/model combination
df_median = results_df.groupby(['Dataset', 'embeddings', 'model'])['f1_score'].median().reset_index()
# Find best model for each dataset
best_per_dataset = df_median.loc[df_median.groupby('Dataset')['f1_score'].idxmax()]
# Identify datasets where quantum models won
qml_winners = best_per_dataset[best_per_dataset['model'].isin(qml_models)]
print(f"Total datasets analyzed: {df_median['Dataset'].nunique()}")
print(f"Datasets where QML won: {len(qml_winners)}")
print(f"Quantum win rate: {len(qml_winners) / df_median['Dataset'].nunique() * 100:.1f}%\n")
if len(qml_winners) > 0:
print("Datasets with Quantum Advantage:")
print("-" * 80)
for idx, row in qml_winners.iterrows():
print(f" {row['Dataset']:40s} | {row['model']:12s} | F1: {row['f1_score']:.4f}")
# Create comparison plot
qml_winner_data = df_median[df_median['Dataset'].isin(qml_winners['Dataset'])].copy()
qml_winner_data['model_type'] = qml_winner_data['model'].apply(
lambda x: 'Quantum' if x in qml_models else 'Classical'
)
plt.figure(figsize=(14, 6))
ax = sns.boxplot(
data=qml_winner_data,
x='Dataset',
y='f1_score',
hue='model_type',
palette={'Quantum': '#1f77b4', 'Classical': '#ff7f0e'}
)
plt.xticks(rotation=45, ha='right')
plt.xlabel('Dataset', fontsize=12)
plt.ylabel('F1 Score', fontsize=12)
plt.title('Quantum vs Classical Performance on Quantum-Winning Datasets',
fontsize=14, fontweight='bold')
plt.legend(title='Model Type', fontsize=11)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
filename = os.path.join(output_dir, f'{tag}_quantum_advantage.png')
plt.savefig(filename, dpi=300, bbox_inches='tight')
print(f"\n✓ Quantum advantage plot saved: {filename}")
plt.show()
plt.close()
else:
print("⚠ No datasets found where quantum models outperformed classical baselines.")
print(" This could indicate:")
print(" - Datasets are too simple for quantum advantage")
print(" - Classical models are well-suited for these problems")
print(" - More complex quantum feature maps may be needed")
print("\n" + "=" * 80)
10. Model Performance Summary#
Generate a comprehensive summary table of all model performances.
[ ]:
if not results_df.empty:
print("Model Performance Summary")
print("=" * 80)
# Calculate average performance by model
summary = results_df.groupby('model').agg({
'f1_score': ['mean', 'std', 'min', 'max'],
'accuracy': ['mean', 'std'],
'auc': ['mean', 'std']
}).round(4)
print("\nAverage Performance by Model:")
print(summary.to_string())
# Identify best models
print("\n" + "-" * 80)
print("Best Performing Models:")
print("-" * 80)
for metric in ['f1_score', 'accuracy', 'auc']:
if metric in results_df.columns:
best_model = results_df.groupby('model')[metric].mean().idxmax()
best_score = results_df.groupby('model')[metric].mean().max()
print(f" {metric:15s}: {best_model:15s} ({best_score:.4f})")
print("\n" + "=" * 80)
else:
print("⚠ No results available for summary")
Key Takeaways#
What We Learned#
QPL Workflow: Successfully applied quantum projection learning with multiple classical learners
Model Comparison: Systematically compared 5+ models on quantum-projected features
Quantum Advantage: Identified scenarios where quantum features provide benefits
Comprehensive Analysis: Used data complexity metrics to understand performance patterns
Best Practices#
Multiple Models: Test various classifiers to find the best match for quantum features
Baseline Comparison: Always compare against strong classical baselines
Data Complexity: Analyze dataset characteristics to predict quantum advantage
Systematic Evaluation: Use cross-validation and multiple metrics for robust assessment
When to Use QPL#
QPL is most effective when:
Dataset has complex, non-linear structure
Feature interactions are important
Classical methods plateau in performance
You want to explore multiple learning algorithms
Next Steps#
Experiment with different quantum feature maps (Z, ZZ, Pauli)
Adjust entanglement strategies (linear, full, circular, pairwise)
Try different numbers of repetitions (
reps)Test on real-world datasets (genomics, medical imaging, etc.)
Explore quantum hardware execution
Combine with other quantum algorithms (VQC, QNN)
Configuration Tips#
For faster experiments:
backend: simulator
reps: 2
n_components: 5
For better accuracy:
backend: simulator
reps: 8
n_components: 10
entanglement: pairwise
For quantum hardware:
backend: ibm_quantum
device: ibm_brisbane # or your preferred device
References#
Questions or Issues? Open an issue on GitHub