You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted in PR #681 ("Potential future work"), the way NN ensemble handles batches could be improved:
I'm not quite happy with how the NN ensemble handles suggestion results from other projects, both during training and suggest operations. For example, the training samples are stored in LMDB one document at a time, but now it would be easier to store them as whole batches instead, which could be more efficient. But I decided that this PR is already much too big and it would make sense to try to improve batching in the NN ensemble in a separate follow-up PR. There is already an attempt to do part of this in PR #676; that could be a possible starting point.
training data is currently stored in LMDB one document at a time; it would make sense to store them as batches instead (and perhaps use another data storage mechanism, e.g. TF Data / Dataset)
_merge_source_batches could perform calculations using sparse arrays and only convert to NumPy arrays at the end (and transpose if necessary)
Of course the changes need to be properly benchmarked.
The text was updated successfully, but these errors were encountered:
As noted in PR #681 ("Potential future work"), the way NN ensemble handles batches could be improved:
In particular:
_merge_source_batches
could perform calculations using sparse arrays and only convert to NumPy arrays at the end (and transpose if necessary)Of course the changes need to be properly benchmarked.
The text was updated successfully, but these errors were encountered: