Replies: 9 comments
-
Any others you're interested in? |
Beta Was this translation helpful? Give feedback.
-
Thx for sharing. How many steps/seen pairs for phase 1 and phase 2? Did phase 2 resume optimizer states from phase 1? |
Beta Was this translation helpful? Give feedback.
-
I think this is the blog post but 32B phase 1 and ~7B phase 2 I believe. And yes, re: opt states. |
Beta Was this translation helpful? Give feedback.
-
Ok, sounds clear to me, thx for quick reply. will impl. one. What's the steps of gradient accumulation used or just make sure seen pairs per update similar as phase1? |
Beta Was this translation helpful? Give feedback.
-
Sorry I don't understand |
Beta Was this translation helpful? Give feedback.
-
Sorry for confusion. The blog post https://laion.ai/blog/giant-openclip/ mentioned gradient accumulation used in phase 2, not sure the steps to accumulate per weight update. |
Beta Was this translation helpful? Give feedback.
-
moving to discussions for future reference... |
Beta Was this translation helpful? Give feedback.
-
Will using gradient accumulation hurt the accuracy if I am only using 256 GPUs (comparing to 800 or 1600)? |
Beta Was this translation helpful? Give feedback.
-
After reading this post (https://laion.ai/blog/giant-openclip/), do we have the exact training hyperparameters of ViT-G/14 somewhere? I cannot find it on rom1504's wandb.
Beta Was this translation helpful? Give feedback.
All reactions