Best practice with distributed training? #895
-
Hello.
Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@songtianhui pretty much all models featured here that were trained with OpenCLIP are using |
Beta Was this translation helpful? Give feedback.
@songtianhui pretty much all models featured here that were trained with OpenCLIP are using
--local-loss --gather-with-grad
.. it's the only option that scales. Back when we first implemented it, we verified that w/ the gradient through gather, the local loss results were equivalence to doing the global loss.