RepEngineBase Class Documentation¶
Module: rep_engine_base
Purpose¶
RepEngineBase is an abstract base class for molecular representation engines. It defines a standard interface and utilities for computing molecular representations from a list of molecules (e.g., SMILES strings), particularly in batched processing. This class is intended to be subclassed, with core functionality like preprocessing and representation computation implemented in derived classes.
Attributes¶
-
engine(str):
Name of the representation engine. Typically defined in a subclass or passed during instantiation. -
rep(str):
Type of molecular representation (e.g.,'fingerprint','embedding'). -
properties(dict):
A deep copy of the instance's dictionary at initialization. Captures configuration state.
Constructor¶
def __init__(self, rep: str, **args)
Parameters:
- rep (str): Type of molecular representation.
- **args (dict): Additional configuration options stored as attributes.
Effect:
Initializes the object, stores rep, and adds all additional keyword arguments to the instance. Also creates a deep copy of all these attributes in self.properties for serialization.
Public Methods¶
compute_reps¶
def compute_reps(self, mols: List[str], verbose: Optional[bool] = False, batch_size: Optional[int] = 12) -> Union[np.ndarray, List[np.ndarray]]
Description:
Computes molecular representations in batches using _preprocess_batch and _rep_batch.
Parameters:
- mols (List[str]): List of molecular inputs (e.g., SMILES strings).
- verbose (bool, optional): If True, shows a progress bar.
- batch_size (int, optional): Number of molecules per batch.
Returns:
- np.ndarray if average_pooling is True or unset.
- List[np.ndarray] if average_pooling is explicitly set to False.
dim¶
def dim(self) -> int
Description:
Abstract method. Must return the dimensionality of the computed representation.
Raises:
- NotImplementedError
_rep_batch¶
def _rep_batch(self, batch: List[str]) -> np.ndarray
Description:
Abstract method. Must compute and return the representation for a batch of molecules.
Raises:
- NotImplementedError
_preprocess_batch¶
def _preprocess_batch(self, batch: List[str]) -> List[str]
Description:
Abstract method. Must return a preprocessed version of the batch for representation.
Raises:
- NotImplementedError
save¶
def save(self, filename: str)
Description:
Serializes and saves the engine’s properties to a YAML file.
Parameters:
- filename (str): Destination path for the YAML file.
Design Notes¶
- This class provides batch processing support and optional average pooling control.
- The use of
batchedfromitertoolssupports Python 3.10+ but also includes a fallback implementation for older versions. - Intended for extension: Subclasses must implement
_rep_batch,_preprocess_batch, anddim.