You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that the value of step changes according to the checkpoint to resume. But this is not the case for batches. It seems that a training will always begin with the data generated from the first step even when this training is resumed from a checkpoint. Is there any way to change the configs to have a data generator that adapts to the checkpoint to resume ?
Thank you
The text was updated successfully, but these errors were encountered:
duplicate of #2006 ?
I recently added an index in the data processed. look here: #2496
actually I have to change the type (no need to be a tensor) but this can be used to achieve restoration of the proper index in datasets.
Do you want to work on this ?
duplicate of #2006 ? I recently added an index in the data processed. look here: #2496
actually I have to change the type (no need to be a tensor) but this can be used to achieve restoration of the proper index in datasets. Do you want to work on this ?
Hi,
I have a question about the data generation when a training is resumed from a checkpoint.
In the training config file:
batch_type
value istokens
.seed
value is fixed to make the result reproducible.step
andbatches
in the following line oftrainer.py
are observed (by printing to a logging file):I found that the value of
step
changes according to the checkpoint to resume. But this is not the case forbatches
. It seems that a training will always begin with the data generated from the first step even when this training is resumed from a checkpoint. Is there any way to change the configs to have a data generator that adapts to the checkpoint to resume ?Thank you
The text was updated successfully, but these errors were encountered: