Skip to content

RepEngineLM

Bases: RepEngineBase

Class RepEngineLM is a subclass of RepEngineBase designed to compute molecular representations using pre-trained language models (LMs) such as T5, ESM, or ChemBERTa. This engine generates vector-based embeddings for input sequences, typically protein or peptide sequences, by leveraging transformer-based models.

Attributes: :type engine: str :param engine: The name of the engine. Default is 'lm', indicating a language model-based representation.

:type device: str
:param device: The device on which the model runs, either `'cuda'` for GPU or `'cpu'`.

:type model: object
:param model: The pre-trained model used for generating representations. The model is loaded from a repository 
                based on the `model` parameter.

:type name: str
:param name: The name of the model engine combined with the model type.

:type dimension: int
:param dimension: The dimensionality of the output representation, corresponding to the model's embedding size.

:type model_name: str
:param model_name: The specific model name used for generating representations.

:type tokenizer: object
:param tokenizer: The tokenizer associated with the model, used for converting sequences into tokenized input.

:type lab: str
:param lab: The laboratory or organization associated with the model (e.g., 'Rostlab', 'facebook', etc.).

Initializes the RepEngineLM with the specified model and pooling options. The model is loaded based on the given model name and its associated tokenizer.

dim()

Returns the dimensionality of the output representation generated by the model.

get_num_params()

Returns the total number of parameters in the model.

max_len()

Returns the maximum allowed sequence length for the model. Some models have a specific maximum sequence length.

move_to_device(device)

Moves the model to the specified device (e.g., 'cuda' or 'cpu').