-
Notifications
You must be signed in to change notification settings - Fork 104
SVD fails in multi-threaded operation #64
Comments
Hmm, some of the results here sound like they might be sporadic and you're not seeing some things fail that should fail. e.g. "reference native" should be equivalent to "debian system liblapack3". I can't guarantee that the reference implementation is thread safe, but it would surprise me if it is not thread safe. If you're seeing threading problems on some system implementations (e.g. ATLAS) but not others (veclib) I'd be tempted to say that ATLAS is simply not thread safe. What concerns me more is that you see threading issues on the reference implementation. I'll have a look at your standalone project later tonight, thank you for putting it together! |
btw, in other topic I assume you seen #45 ? |
incidentally, I'd be surprised if you ever seen a performance improvement by running these sorts of things in parallel. See my talk (linked from netlib-java homepage) at ScalaX last December which explains the low-level of what's happening. Adding more threads/cores will probably slow things down. |
My memory of the JVM memory boundaries is a little shaky. Can you confirm that you still get errors (just test one of the failing cases) if you do the matrix equality assertion inside a |
It is certainly somewhat sporadic. The test runs things for multiple repeats so the error surfaces. In fact, for smaller test matrices we don't see the issue. We just double checked the Debian reference vs system and it still is fail vs non-fail. We we are wanting to do isn't process one matrix across multiple threads but calculate the SVD on multiple different matrices at the same time in different threads. That's what the test attempts to replicate. I hadn't seen the pseudo-inverse implementation, thanks. I'll try the synchronized thing and let you know. |
I can confirm the errors still occur (on Windows) with the synchronized block around the matrix equals assertion. |
Another data point. We tried downloading, compiling and installing ATLAS via apt-get source atlas; apt-get build-dep atlas and then fakeroot debian/rules custom in atlas source directory. The tests then work. The ATLAS version that comes packaged and installed via aptitude install libatlas3-base on Debian fails. |
ok, I'm happy calling ATLAS broken. The really concerning thing for me is that the reference implementation fails. I didn't think it had threading problems at all. BTW, I mean that running multiple SVDs at the same time won't give you a performance boost. The BLAS will optimise the hell out of all your CPU caches and cores, so trying to do more than one at once will just result in page swapping. The best thing you can do is arrange a single queue for accessing the BLAS layer. Watch my talk for more details. But... that's all about performance. Certainly it should give the correct numbers no matter how parallel it is being used. SVD basically only calls LAPACK's |
Yeah, it's been surprising to us also. Well hopefully it's just something idiotic I'm doing in my Java, but I don't think so. But that would be the easy solution. But yeah, I had a quick look through the code for SVD and couldn't see anything obvious. I will take a look at the talk. We're plugging MTJ into our own library for processing graphs/networks for calculating properties based on the adjacency matrix. That is used in a system for distributed and parallel computing over a grid of machines (uses GridGain). We'll obviously need to figure out the optimal approach as the default will be multi-threaded. |
you might want to look at using the sparse stuff, especially the ARPACK example 😉 that is the kind of problem domain (in an industrial setting) that got me started with these projects years ago. FWIW, I don't see any obvious problems in your Java. One thing you might want to try is hitting some kind of memory barrier before the copy operation. e.g. maybe do the copy in a |
also, does the problem fix itself if you synchronize access to your Try this with both sharing and not sharing the |
and one more, can you find some norm distance (even for the one failing element) between the failing and expected matrices. It would be good to see if the failure was systemic. |
I can certainly try those but it'll be a week or so before I'm back in front of my computer. I'll let you know what I find. Thanks for the pointers on ARPACK, some of the things we do are based on the eigan spectra of either the adjacency or Laplacian. So should be useful. |
Got a few minutes to look at this before I leave for my conference. I tried the synchronized access to pseudoInverse method. When the PseudoInverseMTJ instance is created fresh in each task/callable it fails still. If I share the instance and reuse across all tasks the tests pass. This is the opposite of what I'd expect. No shared state fails but shared state passes? I hope you have more idea about what's going on here than me. Given you suggested the approach I guess you have something in mind. |
I committed some code to illustrate. There are now two, very similar, test methods that only differ in whether they share |
😕 my gut keeps telling me this is about visibility and CPU caches... I wonder if something inside MTJ or netlib-java is being initialised in the |
I just made a similar test class but looking at SVD on its own without the additional logic of But, no shared state in that class at all, it's just a helper method? Also, we first noticed the issue using the Sandia wrapper over MTJ which was being used in certain places in our code. The By the way, all this morning's tests done on OSX. |
I'm possibly more confused than you are. I can't see anything weird in your wrapper class at all. I'll have a third look. |
Got a little time to look at this again this afternoon. So, ignore what I said about SVD. It does fail on its own, I just wasn't testing everything. The s vector is always the same but the U and V matrices have different values when calculated in threaded mode. So, it's not the wrapper. That's now less confusing. |
Just read through the netlib-java issue. You think this could just be rounding errors and such? I should say, this isn't stopping us using the library. It's more just an itch to scratch now. And thanks for the work, it's been very useful. |
One other data point. When I look at the two versions of the test with sharing or not of the 'PseudoInverseMTJ ' class I see a big difference in the performance.
the time taken is obviously related to your comments above about performance so thought this might give you a clue. |
this just gets weirder, especially on performance. Can you maybe reduce your tolerance to errors and see if the problem persists? |
Yep, I tried a bunch of that this afternoon and saw nothing useful. Changed both my definition of equals for a double in the tests and the threshold used in pseudo inverse for defining zero. I trying the code from EJML that calculates a dynamic zero threshold during pseudo inverse. Nothing changed really. I did this after my comment on tolerances discussed in the other issue. The magnitude of the differences suggests it isn't a tolerance thing either, they are big differences, easily similar in size to the actual values.
|
it's just got to be memory barrier stuff. Maybe you don't need to do the calculations in a syncronized block, but can you do a synchronized block before and after (and in the comparison) to make sure all the right data is seen in the threads doing the work? |
We noticed a threading issue with some of our code that uses MTJ and I've isolated the issue to SVD failing under certain conditions.
The project https://bitbucket.org/jwray/matrix-library-investigation contains isolated, minimum code illustrating the issue. It's calculating the pseudo-inverse as that's where we noticed the issue.
There are two test class and one of them illustrates the issue with multi-threading. The test fails as above but will pass if the size of the thread pool is set to 1.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: