Better support for suggestion batches in NN ensemble #687

osma · 2023-04-14T11:49:11Z

As noted in PR #681 ("Potential future work"), the way NN ensemble handles batches could be improved:

I'm not quite happy with how the NN ensemble handles suggestion results from other projects, both during training and suggest operations. For example, the training samples are stored in LMDB one document at a time, but now it would be easier to store them as whole batches instead, which could be more efficient. But I decided that this PR is already much too big and it would make sense to try to improve batching in the NN ensemble in a separate follow-up PR. There is already an attempt to do part of this in PR #676; that could be a possible starting point.

In particular:

training documents could be processed by using batch operations on source projects; there was an attempt to do this in PR Batch processing in training of NN ensemble - base project suggest calls #676
training data is currently stored in LMDB one document at a time; it would make sense to store them as batches instead (and perhaps use another data storage mechanism, e.g. TF Data / Dataset)
_merge_source_batches could perform calculations using sparse arrays and only convert to NumPy arrays at the end (and transpose if necessary)

Of course the changes need to be properly benchmarked.

The text was updated successfully, but these errors were encountered:

osma added the enhancement label Apr 14, 2023

osma added this to the 1.0 milestone Apr 14, 2023

juhoinkinen modified the milestones: 1.0, 1.1 Aug 16, 2023

juhoinkinen modified the milestones: 1.1, Short term Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for suggestion batches in NN ensemble #687

Better support for suggestion batches in NN ensemble #687

osma commented Apr 14, 2023

Better support for suggestion batches in NN ensemble #687

Better support for suggestion batches in NN ensemble #687

Comments

osma commented Apr 14, 2023