RepEngineLM¶
Bases: RepEngineBase
Class RepEngineLM
is a subclass of RepEngineBase
designed to compute molecular representations
using pre-trained language models (LMs) such as T5, ESM, or ChemBERTa. This engine generates vector-based
embeddings for input sequences, typically protein or peptide sequences, by leveraging transformer-based models.
Attributes:
:type engine: str
:param engine: The name of the engine. Default is 'lm'
, indicating a language model-based representation.
:type device: str
:param device: The device on which the model runs, either `'cuda'` for GPU or `'cpu'`.
:type model: object
:param model: The pre-trained model used for generating representations. The model is loaded from a repository
based on the `model` parameter.
:type name: str
:param name: The name of the model engine combined with the model type.
:type dimension: int
:param dimension: The dimensionality of the output representation, corresponding to the model's embedding size.
:type model_name: str
:param model_name: The specific model name used for generating representations.
:type tokenizer: object
:param tokenizer: The tokenizer associated with the model, used for converting sequences into tokenized input.
:type lab: str
:param lab: The laboratory or organization associated with the model (e.g., 'Rostlab', 'facebook', etc.).
Initializes the RepEngineLM
with the specified model and pooling options. The model is loaded based on
the given model
name and its associated tokenizer.
dim()
¶
Returns the dimensionality of the output representation generated by the model.
get_num_params()
¶
Returns the total number of parameters in the model.
max_len()
¶
Returns the maximum allowed sequence length for the model. Some models have a specific maximum sequence length.
move_to_device(device)
¶
Moves the model to the specified device (e.g., 'cuda' or 'cpu').