Optional
accumulate_Number of steps to be used for gradient accumulation. Gradient accumulation refers to a method of collecting gradient for configured number of steps instead of updating the model variables at every step and then applying the update to model variables. This can be used as a tool to overcome smaller batch size limitation. Often also referred in conjunction with 'effective batch size'.
Optional
batch_The batch size is a number of samples processed before the model is updated.
Optional
init_Initialization methods for a training.
Optional
init_Initialization text to be used if init_method is set to text
, otherwise this will be ignored.
Optional
learning_Learning rate to be used for training.
Optional
max_Maximum length of input tokens being considered.
Optional
max_Maximum length of output tokens being predicted.
Optional
num_The number of epochs is the number of complete passes through the training dataset. The quality depends on the number of epochs.
Optional
num_Number of virtual tokens to be used for training.
In prompt tuning we are essentially learning the embedded representations for soft prompts,
which are known as virtual tokens, via back propagation for a specific task(s) while keeping
the rest of the model fixed. num_virtual_tokens
is the number of dimensions for these virtual tokens.
Optional
torch_Datatype to use for training of the underlying text generation model. If no value is provided, we pull from torch_dtype in config. If an in memory resource is provided which does not match the specified data type, the model underpinning the resource will be converted in place to the correct torch dtype.
Optional
verbalizerVerbalizer template to be used for formatting data at train and inference time. This template may use brackets to indicate where fields from the data model TrainGenerationRecord must be rendered.
Training parameters for a given model.