You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Strangely, when I do not specify checkpoint (i. e., without the resume_checkpoint command), the model can run normally on two V100s, but when I try to join checkpoint to continue training, the model makes an error
The text was updated successfully, but these errors were encountered:
My dataset consists of 8 thousand grayscale images of 256 * 256 size,the follow is my train script:
MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3"
Strangely, when I do not specify checkpoint (i. e., without the resume_checkpoint command), the model can run normally on two V100s, but when I try to join checkpoint to continue training, the model makes an error
The text was updated successfully, but these errors were encountered: