Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/sort FSRSItem by length to speed up training #32

Merged
merged 14 commits into from
Aug 28, 2023

Conversation

L-M-Sherlock
Copy link
Member

Note: this code remove shuffle of items. It would break stochastic gradient descent. I wander whether it is possible to shuffle at the level of batchs rather than items.

@L-M-Sherlock L-M-Sherlock added the enhancement New feature or request label Aug 25, 2023
@L-M-Sherlock L-M-Sherlock linked an issue Aug 25, 2023 that may be closed by this pull request
@dae
Copy link
Collaborator

dae commented Aug 26, 2023

Is the change I pushed what you had in mind? For a given review length, the items should now appear in a random order.

@L-M-Sherlock
Copy link
Member Author

It's effect is nuanced to the training. Because the order of item in the same batch is not important. I plan to implement a shuffle in batch-level.

https://github.com/burn-rs/burn/blob/main/burn-core/src/data/dataloader/strategy.rs

@dae
Copy link
Collaborator

dae commented Aug 26, 2023

So you want a batch of size 3, then a batch of size 10, then a batch of size 2, etc?

@L-M-Sherlock
Copy link
Member Author

So you want a batch of size 3, then a batch of size 10, then a batch of size 2, etc?

I want a batch with seq_len = 3, then a batch with seq_len = 10, then a batch with seq_len = 2.

@dae dae force-pushed the Feat/sort-FSRSItem-by-length branch 3 times, most recently from 0a1f15a to d99d925 Compare August 26, 2023 07:01
@dae
Copy link
Collaborator

dae commented Aug 26, 2023

Ok, how about now? A test is currently failing due to #31, as I rebased this PR on the main branch.

@L-M-Sherlock
Copy link
Member Author

I have seen your code. But I think we should write a new dataloader rather than manipulate the dataset before build the dataloader. I'm coding now.

@L-M-Sherlock
Copy link
Member Author

The shuffle only applies once in your implementation (if I'm wrong, please feel free to correct me). Ideally, the shuffle should apply before each epoch of training.

@dae
Copy link
Collaborator

dae commented Aug 26, 2023

Ok, I see. Hopefully the code will be useful as a reference anyway. :-)

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Aug 26, 2023

@dae, I have a weird observation during my coding. The results of training always vary over time, even when I remove shuffle. Then I figure it out. It's caused by grouped.into_values() in the group_by_cid(). It's unstable in order. So I apply sorted_by_cached_key(|revlog| revlog.get(0).unwrap().cid) for that. I'm not sure that it's the best practice, but it works.

Then, I implement BatchShuffleDataset, which could Shuffle the indices by batch_size. But I'm not sure whether the training loop would shuffle the dataset before each epoch. I need to ask @nathanielsimard.

@L-M-Sherlock
Copy link
Member Author

Wait for tracel-ai/burn#703

@dae
Copy link
Collaborator

dae commented Aug 27, 2023

In most languages, HashMaps don't retain the order items are put into them - Python's dictionaries are a bit special in that regard. I think we can do this a bit more efficiently by sorting on the SQL end and then using .group_by() - I've pushed an update and some other tidyups.

@dae
Copy link
Collaborator

dae commented Aug 27, 2023

Nice work on the batch shuffling! And thanks @nathanielsimard for the upstream fix.

@dae dae mentioned this pull request Aug 27, 2023
@dae
Copy link
Collaborator

dae commented Aug 27, 2023

Would you like me to rebase this or would you like to?

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Aug 27, 2023

Would you mind doing it? Thanks. I'm out right now.

@dae dae force-pushed the Feat/sort-FSRSItem-by-length branch from f6e2c90 to d6c71f6 Compare August 27, 2023 06:31
@dae
Copy link
Collaborator

dae commented Aug 28, 2023

If you guys are happy with the current state of this, I'd suggest we merge it in, even though the PR mentioned above hasn't landed yet.

@L-M-Sherlock
Copy link
Member Author

Should we update the dependencies of burn in Cargo.toml before merging?

@L-M-Sherlock
Copy link
Member Author

Ignore my last comment. I confused this RR as another PR.

@L-M-Sherlock L-M-Sherlock requested a review from dae August 28, 2023 03:42
@L-M-Sherlock L-M-Sherlock merged commit ee7f6be into main Aug 28, 2023
1 check passed
@L-M-Sherlock L-M-Sherlock deleted the Feat/sort-FSRSItem-by-length branch August 28, 2023 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TODO] feature: custom sampler for batch
2 participants