Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up pre-training #197

Open
yandachen opened this issue Mar 12, 2023 · 0 comments
Open

Speed up pre-training #197

yandachen opened this issue Mar 12, 2023 · 0 comments

Comments

@yandachen
Copy link

Hello, I'm working on a project that involves pre-training GPT-2 Medium. Currently using your code (deepspeed + bf16 + flash attention) it took around 15 days to pre-train for the full 400K steps on 4 A100 GPUs. Do you have any suggestions on possible approaches to further speed up pre-training by e.g., 2x?

One possible solution I'm thinking of is to increase the learning rate. Looks like GPT-2 medium uses a learning rate of 1.5e-4. Did you guys experiment with a larger learning rate? Was the model able to converge faster during pre-training without losing too much of the perplexity?

Any suggestion would be very appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant