Quantum Projection Learning (QPL) Tutorial#

Overview#

Quantum Projection Learning (QPL) is an advanced quantum machine learning technique that extends Projected Quantum Kernels (PQK) by evaluating quantum-projected data with multiple classical machine learning algorithms. Instead of using only Support Vector Machines, QPL systematically compares performance across:

  • Support Vector Classifier (SVC)

  • Random Forest (RF)

  • XGBoost (XGB)

  • Multi-Layer Perceptron (MLP)

  • Logistic Regression (LR)

This comprehensive approach helps identify which classical learner best exploits quantum feature representations for a given dataset.

What You’ll Learn#

  • Generate synthetic classification datasets with controlled complexity

  • Configure and run QPL experiments using QProfiler

  • Compare quantum-enhanced vs. classical baseline performance

  • Analyze results across multiple models and embeddings

  • Visualize performance metrics and identify quantum advantages

Key Concepts#

  • Quantum Feature Maps: Transform classical data into quantum states

  • Quantum Projections: Extract features from quantum circuits via expectation values

  • Ensemble Learning: Combine multiple classical learners on quantum features

  • Data Complexity Analysis: Understand which datasets benefit from quantum processing


1. Setup and Imports#

First, configure the environment and import necessary libraries.

[ ]:
%load_ext autoreload
%autoreload 2

import sys
import os
import re
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set up paths
dir_home = re.sub('QBioCode.*', 'QBioCode', os.getcwd())
sys.path.append(dir_home)
sys.path.append(os.path.join(dir_home, 'apps'))

# Import QBioCode
import qbiocode as qbc
import qprofiler.qprofiler as profiler

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.dpi'] = 100

print("✓ Environment configured successfully")
print(f"✓ Working directory: {os.getcwd()}")

2. Understanding Quantum Projection Learning#

What is QPL?#

Quantum Projection Learning works in three stages:

  1. Quantum Encoding: Classical data is encoded into quantum states using parameterized quantum circuits (feature maps)

  2. Quantum Projection: Expectation values of Pauli operators are measured, creating quantum-derived features

  3. Classical Learning: Multiple classical ML models are trained on these quantum features

Why Multiple Classifiers?#

Different classifiers have different inductive biases:

  • SVC: Effective for high-dimensional, non-linear boundaries

  • Random Forest: Robust to noise, captures feature interactions

  • XGBoost: Excellent for structured data, handles imbalanced classes

  • MLP: Can learn complex non-linear patterns

  • Logistic Regression: Simple, interpretable baseline

By testing all of them, we identify which best exploits quantum features for each dataset.


3. Generate Synthetic Test Data#

We’ll create artificial classification datasets with controlled properties to test QPL performance.

[ ]:
# Dataset configuration
type_of_data = 'classes'
save_path = os.path.join('data', 'qpl_tutorial_data')

# Dataset parameters
N_SAMPLES = [100, 200]              # Number of samples per dataset
N_FEATURES = [10, 15]               # Number of features
N_INFORMATIVE = [5, 8]              # Number of informative features
N_REDUNDANT = [2, 3]                # Number of redundant features
N_CLASSES = [2]                     # Binary classification
N_CLUSTERS_PER_CLASS = [2]          # Clusters per class
WEIGHTS = [[0.3, 0.7], [0.5, 0.5]]  # Class imbalance scenarios

print("Generating synthetic datasets...")
print(f"  - Data type: {type_of_data}")
print(f"  - Sample sizes: {N_SAMPLES}")
print(f"  - Feature dimensions: {N_FEATURES}")
print(f"  - Class weights: {WEIGHTS}")

# Generate datasets
qbc.generate_data(
    type_of_data=type_of_data,
    save_path=save_path,
    n_samples=N_SAMPLES,
    n_features=N_FEATURES,
    n_informative=N_INFORMATIVE,
    n_redundant=N_REDUNDANT,
    n_classes=N_CLASSES,
    n_clusters_per_class=N_CLUSTERS_PER_CLASS,
    weights=WEIGHTS,
)

print(f"\n✓ Datasets generated and saved to: {save_path}")

# List generated files
if os.path.exists(save_path):
    files = [f for f in os.listdir(save_path) if f.endswith('.csv')]
    print(f"✓ Generated {len(files)} dataset files")
    for f in files[:5]:  # Show first 5
        print(f"  - {f}")
    if len(files) > 5:
        print(f"  ... and {len(files) - 5} more")

4. Configure QPL Experiment#

QProfiler uses YAML configuration files to specify experimental parameters. Let’s examine the QPL configuration.

[ ]:
# Load and display QPL configuration
qpl_config_path = 'configs/qpl.yaml'

with open(qpl_config_path, 'r') as f:
    qpl_config = yaml.safe_load(f)

print("QPL Configuration:")
print("=" * 60)
print(yaml.dump(qpl_config, default_flow_style=False, sort_keys=False))
print("=" * 60)

print("\nKey Configuration Parameters:")
print(f"  - Model: {qpl_config.get('model', 'N/A')}")
print(f"  - Backend: {qpl_config.get('backend', 'N/A')}")
print(f"  - Data directory: {qpl_config.get('data_dir', 'N/A')}")
print(f"  - Embeddings: {qpl_config.get('embeddings', 'N/A')}")

if 'pqk' in qpl_config:
    print("\n  PQK-specific parameters:")
    for key, value in qpl_config['pqk'].items():
        print(f"    - {key}: {value}")

5. Run QPL Experiment#

Execute the QPL experiment using QProfiler. This will:

  1. Load the generated datasets

  2. Apply quantum feature maps

  3. Extract quantum projections

  4. Train all 5 classical models on quantum features

  5. Evaluate performance and save results

[ ]:
print("=" * 80)
print("RUNNING QUANTUM PROJECTION LEARNING EXPERIMENT")
print("=" * 80)
print("\nThis may take several minutes depending on:")
print("  - Number of datasets")
print("  - Dataset size")
print("  - Quantum backend (simulator vs. hardware)")
print("  - Number of models to train\n")

# Run QPL experiment
profiler.main(qpl_config)

print("\n" + "=" * 80)
print("✓ QPL EXPERIMENT COMPLETED")
print("=" * 80)

6. Run Classical Baseline (XGBoost)#

For comparison, we’ll run a classical baseline using XGBoost on the original (non-quantum) features.

[ ]:
print("=" * 80)
print("RUNNING CLASSICAL BASELINE (XGBoost)")
print("=" * 80)

# Load classical baseline configuration
xgb_config_path = 'configs/xgb.yaml'
xgb_config = yaml.safe_load(open(xgb_config_path, 'r'))

print("\nXGBoost Configuration:")
print(f"  - Model: {xgb_config.get('model', 'N/A')}")
print(f"  - Data directory: {xgb_config.get('data_dir', 'N/A')}")
print(f"  - Embeddings: {xgb_config.get('embeddings', 'N/A')}\n")

# Run classical baseline
profiler.main(xgb_config)

print("\n" + "=" * 80)
print("✓ CLASSICAL BASELINE COMPLETED")
print("=" * 80)

7. Load and Compile Results#

Collect all results from the experiments and compile them into comprehensive DataFrames.

[ ]:
print("Loading experimental results...\n")

# Find all result files
data_eval_files = [
    os.path.join(dp, f)
    for dp, dn, filenames in os.walk(os.getcwd())
    for f in filenames
    if f == 'RawDataEvaluation.csv'
]

model_result_files = [
    os.path.join(dp, f)
    for dp, dn, filenames in os.walk(os.getcwd())
    for f in filenames
    if f == 'ModelResults.csv'
]

print(f"Found {len(data_eval_files)} data evaluation files")
print(f"Found {len(model_result_files)} model result files\n")

# Load and compile data complexity evaluations
if data_eval_files:
    rawevals_df = pd.concat([pd.read_csv(f) for f in data_eval_files], ignore_index=True)
    rawevals_df.to_csv('compiled_raw_data_evaluations.csv', index=False)
    print(f"✓ Data evaluations compiled: {rawevals_df.shape}")
    print(f"  Saved to: compiled_raw_data_evaluations.csv")
else:
    print("⚠ No data evaluation files found")
    rawevals_df = pd.DataFrame()

# Load and compile model results
if model_result_files:
    results_df = pd.concat([pd.read_csv(f) for f in model_result_files], ignore_index=True)

    # Add useful columns for analysis
    results_df['datatype'] = results_df['Dataset'].str.replace('-.*', '', regex=True)
    results_df['model_embed_datatype'] = (
        results_df['model'] + '_' +
        results_df['embeddings'] + '_' +
        results_df['datatype']
    )
    results_df['model_datatype'] = (
        results_df['model'] + '_' + results_df['datatype']
    )

    results_df.to_csv('compiled_results.csv', index=False)
    print(f"\n✓ Model results compiled: {results_df.shape}")
    print(f"  Saved to: compiled_results.csv")

    # Display summary statistics
    print("\nResults Summary:")
    print(f"  - Unique datasets: {results_df['Dataset'].nunique()}")
    print(f"  - Models tested: {results_df['model'].unique().tolist()}")
    print(f"  - Embeddings used: {results_df['embeddings'].unique().tolist()}")
    print(f"  - Metrics available: {[col for col in results_df.columns if 'score' in col or 'accuracy' in col or 'auc' in col]}")
else:
    print("⚠ No model result files found")
    results_df = pd.DataFrame()

print("\n" + "=" * 80)

8. Performance Visualization#

Create comprehensive visualizations comparing quantum and classical performance.

[ ]:
if results_df.empty:
    print("⚠ No results to visualize. Please run the experiments first.")
else:
    # Create output directory
    output_dir = 'performance_summary_and_spearman_correlation_plots'
    os.makedirs(output_dir, exist_ok=True)

    tag = 'qpl_tutorial'

    print("Generating performance visualizations...\n")

    # Plot performance metrics
    metrics = ['f1_score', 'accuracy', 'auc']

    for metric in metrics:
        if metric not in results_df.columns:
            print(f"⚠ Metric '{metric}' not found in results")
            continue

        plt.figure(figsize=(12, 6))

        # Create boxplot
        sns.boxplot(
            data=results_df,
            x='model_datatype',
            y=metric,
            hue='embeddings',
            palette='Set2'
        )

        plt.ylim(0, 1)
        plt.xticks(rotation=45, ha='right')
        plt.xlabel('Model & Data Type', fontsize=12)
        plt.ylabel(metric.replace('_', ' ').title(), fontsize=12)
        plt.title(f'{metric.replace("_", " ").title()} Comparison: Quantum vs Classical',
                  fontsize=14, fontweight='bold')
        plt.legend(title='Embeddings', bbox_to_anchor=(1.05, 1), loc='upper left')
        plt.grid(axis='y', alpha=0.3)
        plt.tight_layout()

        # Save figure
        filename = os.path.join(output_dir, f'{tag}_{metric}_boxplot.png')
        plt.savefig(filename, dpi=300, bbox_inches='tight')
        print(f"✓ Saved: {filename}")

        plt.show()
        plt.close()

    print(f"\n✓ All visualizations saved to: {output_dir}/")

9. Quantum Advantage Analysis#

Identify datasets where quantum methods outperform classical baselines.

[ ]:
if results_df.empty:
    print("⚠ No results to analyze")
else:
    print("Analyzing Quantum Advantage...\n")
    print("=" * 80)

    # Define quantum and classical models
    qml_models = ['pqk_lr', 'pqk_svc', 'pqk_rf', 'pqk_mlp', 'pqk_xgb']

    # Calculate median F1 score across splits for each dataset/model combination
    df_median = results_df.groupby(['Dataset', 'embeddings', 'model'])['f1_score'].median().reset_index()

    # Find best model for each dataset
    best_per_dataset = df_median.loc[df_median.groupby('Dataset')['f1_score'].idxmax()]

    # Identify datasets where quantum models won
    qml_winners = best_per_dataset[best_per_dataset['model'].isin(qml_models)]

    print(f"Total datasets analyzed: {df_median['Dataset'].nunique()}")
    print(f"Datasets where QML won: {len(qml_winners)}")
    print(f"Quantum win rate: {len(qml_winners) / df_median['Dataset'].nunique() * 100:.1f}%\n")

    if len(qml_winners) > 0:
        print("Datasets with Quantum Advantage:")
        print("-" * 80)
        for idx, row in qml_winners.iterrows():
            print(f"  {row['Dataset']:40s} | {row['model']:12s} | F1: {row['f1_score']:.4f}")

        # Create comparison plot
        qml_winner_data = df_median[df_median['Dataset'].isin(qml_winners['Dataset'])].copy()
        qml_winner_data['model_type'] = qml_winner_data['model'].apply(
            lambda x: 'Quantum' if x in qml_models else 'Classical'
        )

        plt.figure(figsize=(14, 6))
        ax = sns.boxplot(
            data=qml_winner_data,
            x='Dataset',
            y='f1_score',
            hue='model_type',
            palette={'Quantum': '#1f77b4', 'Classical': '#ff7f0e'}
        )

        plt.xticks(rotation=45, ha='right')
        plt.xlabel('Dataset', fontsize=12)
        plt.ylabel('F1 Score', fontsize=12)
        plt.title('Quantum vs Classical Performance on Quantum-Winning Datasets',
                  fontsize=14, fontweight='bold')
        plt.legend(title='Model Type', fontsize=11)
        plt.grid(axis='y', alpha=0.3)
        plt.tight_layout()

        filename = os.path.join(output_dir, f'{tag}_quantum_advantage.png')
        plt.savefig(filename, dpi=300, bbox_inches='tight')
        print(f"\n✓ Quantum advantage plot saved: {filename}")

        plt.show()
        plt.close()
    else:
        print("⚠ No datasets found where quantum models outperformed classical baselines.")
        print("   This could indicate:")
        print("   - Datasets are too simple for quantum advantage")
        print("   - Classical models are well-suited for these problems")
        print("   - More complex quantum feature maps may be needed")

    print("\n" + "=" * 80)

10. Model Performance Summary#

Generate a comprehensive summary table of all model performances.

[ ]:
if not results_df.empty:
    print("Model Performance Summary")
    print("=" * 80)

    # Calculate average performance by model
    summary = results_df.groupby('model').agg({
        'f1_score': ['mean', 'std', 'min', 'max'],
        'accuracy': ['mean', 'std'],
        'auc': ['mean', 'std']
    }).round(4)

    print("\nAverage Performance by Model:")
    print(summary.to_string())

    # Identify best models
    print("\n" + "-" * 80)
    print("Best Performing Models:")
    print("-" * 80)

    for metric in ['f1_score', 'accuracy', 'auc']:
        if metric in results_df.columns:
            best_model = results_df.groupby('model')[metric].mean().idxmax()
            best_score = results_df.groupby('model')[metric].mean().max()
            print(f"  {metric:15s}: {best_model:15s} ({best_score:.4f})")

    print("\n" + "=" * 80)
else:
    print("⚠ No results available for summary")

Key Takeaways#

What We Learned#

  1. QPL Workflow: Successfully applied quantum projection learning with multiple classical learners

  2. Model Comparison: Systematically compared 5+ models on quantum-projected features

  3. Quantum Advantage: Identified scenarios where quantum features provide benefits

  4. Comprehensive Analysis: Used data complexity metrics to understand performance patterns

Best Practices#

  • Multiple Models: Test various classifiers to find the best match for quantum features

  • Baseline Comparison: Always compare against strong classical baselines

  • Data Complexity: Analyze dataset characteristics to predict quantum advantage

  • Systematic Evaluation: Use cross-validation and multiple metrics for robust assessment

When to Use QPL#

QPL is most effective when:

  • Dataset has complex, non-linear structure

  • Feature interactions are important

  • Classical methods plateau in performance

  • You want to explore multiple learning algorithms

Next Steps#

  • Experiment with different quantum feature maps (Z, ZZ, Pauli)

  • Adjust entanglement strategies (linear, full, circular, pairwise)

  • Try different numbers of repetitions (reps)

  • Test on real-world datasets (genomics, medical imaging, etc.)

  • Explore quantum hardware execution

  • Combine with other quantum algorithms (VQC, QNN)

Configuration Tips#

For faster experiments:

backend: simulator
reps: 2
n_components: 5

For better accuracy:

backend: simulator
reps: 8
n_components: 10
entanglement: pairwise

For quantum hardware:

backend: ibm_quantum
device: ibm_brisbane  # or your preferred device

References#


Questions or Issues? Open an issue on GitHub