Skip to content
This repository has been archived by the owner on Nov 10, 2017. It is now read-only.

Nondeterministic behaviour of SVD for some blas implementations #82

Closed
marcelluethi opened this issue Apr 16, 2015 · 6 comments
Closed

Comments

@marcelluethi
Copy link

Hi,

I hope this is the right place to report issues (I haven't found a mailing list).

We are experiencing non-deterministic behaviour when doing an SVD with breeze.
Calling SVD several times with the same matrix results in (sometimes drastically) different results (see the test code below).
We know, however, that it is related to the the underlying blas library that is used, as we experience the problem only when we use:

  • the native libraries bundled with netlib-java
  • atlas on Ubuntu 14.04

The problem does not occur if we use

  • F2jBLAS
  • Use Openblas

Here is the code to reproduce it:

import breeze.linalg.DenseMatrix
import breeze.linalg.svd.SVD

object Test {

  def main(args : Array[String]) : Unit = {
   // System.setProperty("com.github.fommil.netlib.BLAS", "com.github.fommil.netlib.F2jBLAS")
   // System.setProperty("com.github.fommil.netlib.LAPACK", "com.github.fommil.netlib.F2jLAPACK")
   // System.setProperty("com.github.fommil.netlib.ARPACK", "com.github.fommil.netlib.F2jARPACK")


    val n = 1000
    val K = DenseMatrix.zeros[Float](n, n)
    val xs = (0 until 1000 by 10)
    for ((x, i) <- xs.zipWithIndex;
         (y, j) <- xs.zipWithIndex)
    {
      K(i,j) = Math.exp(- Math.pow(x-y, 2) / 1000).toFloat
    }
    K +=  DenseMatrix.eye[Float](n) * 1e-6f

    for (i <- (0 until 100)) {
      val SVD(u,d,v)= breeze.linalg.svd(K)
      println(s"u ${u.sum} d ${d.min} v ${v.sum} ")
    }
  }
}

And this is the output it produces:

...
u 916.70703 d 7.860268E-7 v 916.7068 
u 918.3815 d 6.009904E-7 v 918.38116 
u 916.70703 d 7.860268E-7 v 916.7069 
...

While this example uses float arrays, similar problems also occur when using double precision.
Do you have any idea what could be the problem?

Thanks,

Marcel

@fommil
Copy link
Owner

fommil commented Apr 16, 2015

interesting, we had a similar report at fommil/matrix-toolkits-java#64 which is ultimately using the same core libs.

Fundamentally, the results are always going to be prone to error (see my scalax talk's section about errors) and quality can vary wildly between implementations.

However, the more concerning thing from my perspective is if there is a multi-threaded aspect to these reports. The jury is still out, but it looks like some threads see the arrays in a different state. Try wrapping your tests in synchronized blocks to force a memory resync (equivalent to putting volatile on everything). ok you only have one thread, so that's irrelevant.

@fommil
Copy link
Owner

fommil commented Apr 16, 2015

http://fommil.github.io/scalax14/#/ => go to video bit

in particular skip through to http://fommil.github.io/scalax14/#/6 (key down in the web slides)

@marcelluethi
Copy link
Author

@fommil As you say, the problem can be reproduced from a single thread. But it seems to be more prominent if we run things in parallel. In one case (a large example that is too big to reproduce here), it gave horrible results when we run things in parallel and worked without problems when we run it single threaded. Does atlas also parallelize internally? Using libatlas3gf-base (on Ubuntu 14.04) also seems to be causing the problem.

@fommil
Copy link
Owner

fommil commented Apr 16, 2015

the implementations most certainly will parallelise under the hood. If you are able to do some error tracking using the BLAS/LAPACK error tracing functions then it would be incredibly useful to confirm that the results are coming back within the tolerances claimed by the implementation.

Honestly, it is not surprising that you're seeing variation and I recommend using Intel MKL over ATLAS / OpenBLAS. If accuracy is more important to you than speed, then you might even want to look at using a specialist high precision implementation (e.g. the xxdg... libs) directly.

@marcelluethi
Copy link
Author

Unfortunately, I will personally not find time to dig deep into the problem within the next weeks, but can follow it up a bit later if needed (from early June things should look better, time-wise).

Neither accuracy nor speed is a big issue for us at the moment, but the behaviour should be deterministic. The code is part of an evaluation metric that assesses the quality of a statistical model, and we realized that different runs for the same model resulted in very different metric values.
For the moment we will therefore use F2jblas to guarantee that we get consistent results.

BTW, thanks a lot for providing these libraries. So far they worked very well for us.

@fommil
Copy link
Owner

fommil commented Apr 17, 2015

Ok, if you need determinism then you can't use any parallelism. Go with F2J.

Rounding errors alone are prone to ordering.

Closing, but I'd still love to hear the results of your analysis here when you do it.

@fommil fommil closed this as completed Apr 17, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants