Training parameters for a given model.

Properties

Number of steps to be used for gradient accumulation. Gradient accumulation refers to a method of collecting gradient for configured number of steps instead of updating the model variables at every step and then applying the update to model variables. This can be used as a tool to overcome smaller batch size limitation. Often also referred in conjunction with 'effective batch size'.

The batch size is a number of samples processed before the model is updated.

Initialization methods for a training.

Initialization text to be used if init_method is set to text, otherwise this will be ignored.

Learning rate to be used for training.

Maximum length of input tokens being considered.

Maximum length of output tokens being predicted.

The number of epochs is the number of complete passes through the training dataset. The quality depends on the number of epochs.

num_virtual_tokens?: WatsonXAI.TrainingNumVirtualTokens

Number of virtual tokens to be used for training. In prompt tuning we are essentially learning the embedded representations for soft prompts, which are known as virtual tokens, via back propagation for a specific task(s) while keeping the rest of the model fixed. num_virtual_tokens is the number of dimensions for these virtual tokens.

Datatype to use for training of the underlying text generation model. If no value is provided, we pull from torch_dtype in config. If an in memory resource is provided which does not match the specified data type, the model underpinning the resource will be converted in place to the correct torch dtype.

Verbalizer template to be used for formatting data at train and inference time. This template may use brackets to indicate where fields from the data model TrainGenerationRecord must be rendered.