-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] compare/replace cell matching code with scipy.optimize
implementation
#77
Comments
I do agree with that we shouldn't re-implement stuff and reuse as much as possible. The main reason why I re-implemented it, is that every existing implementation that solves this problem expects a cost matrix input of size NxM. For our example with 250k cells, this would have required a 200+GB matrix for float32. There may be a way to hack a fake numpy array that calls a function that computes the distance for each call to I'll be honest that while I spent some time reading the algorithm and ideas behind it and I think I understand it at a cursory level, I don't understand it deep enough to properly debug. However, I also expect this kind of algorithm, and given where it comes from, that it should be a set and forget as long as no one changes any of the logic!? |
I opened an issue with scipy: scipy/scipy#20729. |
This is a fair point. As long as no one changes any of the logic and
But we should question point 1. in that case. 😆
Writing down some related thoughts that might help down the line (this is probably not a priority for the core team right now)
|
It's a fair worry that issues with the algorithm would lead to misjudging changes in e.g. cellfinder. Perhaps if we think of the algorithm as more of another input to consider when looking at changes in cell counting rather than the histogram must look a certain way!?
The issue there is that you'd still need a large harddrive, just for this computation - and it'd probably take a while to compute and create the cost data on disk. But even then, given that the algorithm needs to constantly query the cost of different indices, the CPU having to look it up on a harddrive, even a fast SSD would significantly reduce performance. Compared to in memory access, I don't think it'd be fast enough!?
I added the zero-distance optimization you mentioned to the original PR! |
I think with the improvements made to #74 after this was opened, we can close this as not planned (for now). Our cell matching now is tailored to BrainGlobe's needs (it is fast, and it allows "pre-matching", and it is tested. We can revisit if the scipy issue is addressed if we like. |
Is your feature request related to a problem? Please describe.
The core developer team is not confident we'll be able to maintain the cell matching code long-term (mainly because I don't feel like I entirely understand each line 😅 !)
Describe the solution you'd like
Unless it's a lot less performant, we should consider replacing the core logic of our cell matching functionality with
scipy.optimize.linear_sum_assigment
, to reduce the maintenance burden. This is not exactly the same algorithm, but is related and useful for the same class of problems, AFAICT.A solution like this would also fit well with the idea of BrainGlobe bringing the neuroanatomical context to the Python ecosystem, rather than re-implementing parts of it.
Describe alternatives you've considered
\
Additional context
\
The text was updated successfully, but these errors were encountered: