Class AutoPeptideML¶
Overview¶
AutoPeptideML is a configurable machine learning workflow class designed for peptide modeling. It integrates data pipelines, representations, model training (with HPO), evaluation, and export.
Class: AutoPeptideML¶
Constructor¶
AutoPeptideML(config: dict)
- Initializes the AutoPeptideML workflow with a provided configuration dictionary.
- Creates output directories and stores pipeline, representation, training, and database settings.
Public Methods¶
get_pipeline¶
get_pipeline(pipe_config: Optional[dict] = None) -> Pipeline
Load or construct the preprocessing pipeline.
get_database¶
get_database(db_config: Optional[dict] = None) -> Database
Create or load the peptide database with optional negative data support.
get_reps¶
get_reps(rep_config: Optional[dict] = None) -> Tuple[Dict[str, RepEngineBase], Dict[str, np.ndarray]]
Load or compute representations for the data.
get_test¶
get_test(test_config: Optional[Dict] = None) -> HestiaGenerator
Partition the dataset into training/validation/test using HestiaGenerator.
get_train¶
get_train(train_config: Optional[Dict] = None) -> BaseTrainer
Load and return the trainer based on the configuration (supports Optuna and Grid).
run_hpo¶
run_hpo() -> Dict
Perform hyperparameter optimization across dataset partitions.
run_evaluation¶
run_evaluation(models) -> pd.DataFrame
Run evaluation on the trained models and return a DataFrame of results.
save_experiment¶
save_experiment(model_backend: str = 'onnx', save_reps: bool = False, save_test: bool = True, save_all_models: bool = True)
Save the full experiment including models, test partitions, and configuration.
save_database¶
save_database()
Export the database to CSV.
save_models¶
save_models(ensemble_path: str, backend: str = 'onnx', save_all: bool = True)
Save models using onnx or joblib backends.
save_reps¶
save_reps(rep_dir: str)
Save precomputed representations to disk.
predict¶
predict(df: pd.DataFrame, feature_field: str, experiment_dir: str, backend: str = 'onnx') -> np.ndarray
Load a saved experiment and predict using the trained ensemble on new data.
Configuration Keys¶
The config dictionary passed to the constructor must include the following keys:
outputdir: strpipeline: dict or strrepresentation: dict or strtrain: dict or strdatabases: dicttest: dict
Dependencies¶
- pandas, numpy
- yaml, json
- hestia
- sklearn
- skl2onnx, onnxmltools, joblib (optional)
Example Usage¶
from autopipeline.autopeptideml import AutoPeptideML
config = yaml.safe_load(open('config.yml'))
runner = AutoPeptideML(config)
pipeline = runner.get_pipeline()
db = runner.get_database()
reps, x = runner.get_reps()
test = runner.get_test()
trainer = runner.get_train()
models = runner.run_hpo()
evaluation = runner.run_evaluation(models)
runner.save_experiment()
For detailed config templates and supported options, see the corresponding YAML schema documentation.