qbiocode.data_generation.generator module#
Main data generation interface for QBioCode.
This module provides a unified interface to generate various types of synthetic datasets for machine learning benchmarking and evaluation.
Summary#
Functions:
Generate synthetic datasets for machine learning benchmarking. |
Reference#
- generate_data(type_of_data=None, save_path=None, n_samples=[100, 120, 140, 160, 180, 200, 220, 240, 260, 280], noise=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], hole=[True, False], n_classes=[2], dim=[3, 6, 9, 12], rad=[3, 6, 9, 12], n_features=[10, 30, 50], n_informative=[2, 6], n_redundant=[2, 6], n_clusters_per_class=[1], weights=[[0.3, 0.7], [0.4, 0.6], [0.5, 0.5]], random_state=42)[source]#
Generate synthetic datasets for machine learning benchmarking.
Unified interface to generate various types of synthetic datasets with configurable parameters. Each dataset type creates multiple configurations by varying the specified parameters.
- Parameters:
type_of_data (str) – Type of dataset to generate. Options: ‘circles’, ‘moons’, ‘classes’, ‘s_curve’, ‘spheres’, ‘spirals’, ‘swiss_roll’.
save_path (str) – Directory path where datasets will be saved.
n_samples (list of int, default=range(100, 300, 20)) – Sample sizes for dataset configurations.
noise (list of float, default=[0.1, 0.2, ..., 0.9]) – Noise levels to apply.
hole (list of bool, default=[True, False]) – Whether to include hole (for swiss_roll only).
n_classes (list of int, default=[2]) – Number of classes (for spirals and classes).
dim (list of int, default=[3, 6, 9, 12]) – Dimensionalities (for spheres and spirals).
rad (list of float, default=[3, 6, 9, 12]) – Radii (for spheres only).
n_features (list of int, default=range(10, 60, 20)) – Feature counts (for classes only).
n_informative (list of int, default=range(2, 8, 4)) – Informative feature counts (for classes only).
n_redundant (list of int, default=range(2, 8, 4)) – Redundant feature counts (for classes only).
n_clusters_per_class (list of int, default=range(1, 2, 3)) – Clusters per class (for classes only).
weights (list of list of float, default=[[0.3, 0.7], [0.4, 0.6], [0.5, 0.5]]) – Class weight distributions (for classes only).
random_state (int, default=42) – Random seed for reproducibility.
- Returns:
Saves generated datasets to the specified path.
- Return type:
None
- Raises:
ValueError – If type_of_data is not one of the supported types.
Examples
>>> from qbiocode.data_generation import generate_data >>> generate_data(type_of_data='circles', save_path='data/circles') Generating circles dataset... Dataset generation complete.