Introduction

The repository supports

  1. Training of language models from

    1. scratch

    2. existing pre-trained language models

  2. Fine-tuning of pre-trained language models for

    1. Sequence-Labelling tasks like POS, NER, etc.

    2. Text classification tasks like Sentiment Analysis, etc.

The following models are supported for training language models

  1. Encoder-only models (BERT like models)

    1. Masked Language Modeling

    2. Whole-word Masked Language Modeling

  2. Auto-regressive Models (GPT like models)

    1. Causal Language Modeling

  3. Encoder-Decoder Models (mBART, mT5 like models)

    1. Denoising objective

    2. Whole-word Denoising objective

This portal provides a detailed documentation of the NL-FM-Toolkit toolkit. It describes how to use the PyTorch project and how it works.