-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't shuffle the dataset when num_epochs=1 #118
Comments
This problem is solved by migrating to AllenNLP>=2.0.0. I will close this once I have merged the migration. |
Still a question: So you DO shuffle between epochs - great. But why is it that the loss signficantly drops when I restart ( EDIT: I just checked that you use AllenNLP 1.1.0 still... does that mean that my assumption is - still - correct? |
RE dataset shuffling: With the current code, shuffling should happen at the beginning of every epoch. RE big drop in loss after first epoch: I think the big drop you are seeing has to do with how AllenNLP presents the loss. I believe it is a running average during that epoch. Loss starts off very high, naturally, so during the first epoch, the average loss is correspondingly high. When a new epoch begins, the loss is much lower because a new running average (for that epoch only) is being computed. Just a note that I am going off memory here and could be wrong. |
Hi @JohnGiorgi , thanks for your quick response and help! Regarding the loss: I am not quite sure I understand correctly. What I mean is (example-wise): 1 - I start with loss = 10 in epoch 0, then it reaches 7 at the end of epoch 9 (10 epochs in total). The model is saved. I have now run for 30 epochs in total, but divided into 3 runs with 10 epochs each. Every time, I had this behaviour (as described above) This cannot have happened by accident. To my understanding, it should not make a difference, that I run 3x10 epochs instead of 30 epochs in one go. OK, I agree, disregarding that due to stochastic behaviour (gradient descent etc.) we can get different results, but not to this extent. Where am I wrong? |
Did you confirm that the loss is significantly different when you train for 30 straight epochs vs 3 X 10 epoch runs? I never broke training up with multiple steps and continued using |
No, I did not try out for a full 30 epoch run yet. But due to overfitting or local minima for instance the gap between run 1, epoch 9 and run 2, epoch 0 is way too big to be by chance. Same applies to run 2 -> run 3. The "advantage" is/might be that I have a more fine grained control over training runtime (cloud GPU is not for free :) and can decide to continue or not between intermediate runs. Btw. why is there no validation set or do I use |
Ah, I just remembered that there is a learning rate scheduler that is likely being reset every time training restarts. This could explain the big drop in loss across restarts. Kind of annoying but, you may need to manually set some of the parameters of the learning rate scheduler so it works as expected for restarting runs (like
There is no validation set. We validate on the development sets of SentEval after training is complete. See #190 |
Ok, thanks a lot - I'll look into it. |
Currently, the dataset reader will shuffle the dataset during every epoch. In order to do this, it reads the entire dataset into memory, shuffles it, then yields instances one-by-one. This was the only way I could figure out how to shuffle a lazy AllenNLP dataset reader.
Unfortunately, for large datasets this means we need a lot of memory. Fortunately, for large datasets, really good performance can be achieved in only 1 epoch (as we found in the paper). Therefore, I think the
DatasetReader
should be updated such that shuffling only happens whennum_epochs > 1
. I am not sure how theDatasetReader
could get access tonum_epochs
, so the user may just have to provide ashuffle
argument.The text was updated successfully, but these errors were encountered: