ViT-G/14 training hyperparameter #628

howardhsu · 2023-08-25T01:22:34Z

howardhsu
Aug 25, 2023

After reading this post (https://laion.ai/blog/giant-openclip/), do we have the exact training hyperparameters of ViT-G/14 somewhere? I cannot find it on rom1504's wandb.

mitchellnw · 2023-08-25T01:25:55Z

mitchellnw
Aug 25, 2023
Maintainer

LR 2e-3
AdamW beta2 0.95
Warmup 10k
Batch size 160k
Patch drop 0.5

Any others you're interested in?

0 replies

howardhsu · 2023-09-01T05:56:51Z

howardhsu
Sep 1, 2023
Author

Thx for sharing. How many steps/seen pairs for phase 1 and phase 2? Did phase 2 resume optimizer states from phase 1?

0 replies

mitchellnw · 2023-09-01T06:01:24Z

mitchellnw
Sep 1, 2023
Maintainer

I think this is the blog post but 32B phase 1 and ~7B phase 2 I believe. And yes, re: opt states.

0 replies

howardhsu · 2023-09-01T06:13:36Z

howardhsu
Sep 1, 2023
Author

Ok, sounds clear to me, thx for quick reply. will impl. one. What's the steps of gradient accumulation used or just make sure seen pairs per update similar as phase1?

0 replies

mitchellnw · 2023-09-01T06:14:33Z

mitchellnw
Sep 1, 2023
Maintainer

Sorry I don't understand

0 replies

howardhsu · 2023-09-01T06:17:46Z

howardhsu
Sep 1, 2023
Author

Sorry for confusion. The blog post https://laion.ai/blog/giant-openclip/ mentioned gradient accumulation used in phase 2, not sure the steps to accumulate per weight update.

0 replies

mitchellnw · 2023-09-01T06:18:33Z

mitchellnw
Sep 1, 2023
Maintainer

Oh, 2

0 replies

rwightman · 2023-09-14T06:04:52Z

rwightman
Sep 14, 2023
Maintainer

moving to discussions for future reference...

0 replies

leng-yue · 2023-12-08T22:01:44Z

leng-yue
Dec 8, 2023

Will using gradient accumulation hurt the accuracy if I am only using 256 GPUs (comparing to 800 or 1600)?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViT-G/14 training hyperparameter #628

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ViT-G/14 training hyperparameter #628

howardhsu Aug 25, 2023

Replies: 9 comments

mitchellnw Aug 25, 2023 Maintainer

howardhsu Sep 1, 2023 Author

mitchellnw Sep 1, 2023 Maintainer

howardhsu Sep 1, 2023 Author

mitchellnw Sep 1, 2023 Maintainer

howardhsu Sep 1, 2023 Author

mitchellnw Sep 1, 2023 Maintainer

rwightman Sep 14, 2023 Maintainer

leng-yue Dec 8, 2023

howardhsu
Aug 25, 2023

mitchellnw
Aug 25, 2023
Maintainer

howardhsu
Sep 1, 2023
Author

mitchellnw
Sep 1, 2023
Maintainer

howardhsu
Sep 1, 2023
Author

mitchellnw
Sep 1, 2023
Maintainer

howardhsu
Sep 1, 2023
Author

mitchellnw
Sep 1, 2023
Maintainer

rwightman
Sep 14, 2023
Maintainer

leng-yue
Dec 8, 2023