-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(4/n) Data Refactor - Finetuning Scripts #950
Conversation
When running with 1 epoch and an epoch size of 50000 on Alpaca (default 64 global batch size and microbatch size of 1), that's currently how the learning rate looks like during the course of the training: This looks good and is exactly what I would expect. When I reduced the training epoch size to 1000 and adjusted the warmup steps 100 -> 10, then it seems to be doing something weird: I think we need to adjust the code so that the scheduler steps here steps_per_epoch = len(train_dataloader) // train.gradient_accumulation_iters(devices)
lr_max_steps = train.epochs * steps_per_epoch are perhaps computed by the |
… into refactor/data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments would apply to all the other files too
Co-authored-by: rasbt <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: rasbt <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: rasbt <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]>
Fixes #954
Fixes #951