Number of steps to be used for gradient accumulation. Gradient accumulation refers to a method of
collecting gradient for configured number of steps instead of updating the model variables at
every step and then applying the update to model variables. This can be used as a tool to
overcome smaller batch size limitation. Often also referred in conjunction with "effective batch
size".
Number of steps to be used for gradient accumulation. Gradient accumulation refers to a method of collecting gradient for configured number of steps instead of updating the model variables at every step and then applying the update to model variables. This can be used as a tool to overcome smaller batch size limitation. Often also referred in conjunction with "effective batch size".