Introduction
The repository supports
Training of language models from
scratch
existing pre-trained language models
Fine-tuning of pre-trained language models for
Sequence-Labelling tasks like POS, NER, etc.
Text classification tasks like Sentiment Analysis, etc.
The following models are supported for training language models
Encoder-only models (BERT like models)
Masked Language Modeling
Whole-word Masked Language Modeling
Auto-regressive Models (GPT like models)
Causal Language Modeling
Encoder-Decoder Models (mBART, mT5 like models)
Denoising objective
Whole-word Denoising objective
This portal provides a detailed documentation of the NL-FM-Toolkit toolkit. It describes how to use the PyTorch project and how it works.