Nondeterministic behaviour of SVD for some blas implementations #82

marcelluethi · 2015-04-16T13:50:11Z

Hi,

I hope this is the right place to report issues (I haven't found a mailing list).

We are experiencing non-deterministic behaviour when doing an SVD with breeze.
Calling SVD several times with the same matrix results in (sometimes drastically) different results (see the test code below).
We know, however, that it is related to the the underlying blas library that is used, as we experience the problem only when we use:

the native libraries bundled with netlib-java
atlas on Ubuntu 14.04

The problem does not occur if we use

F2jBLAS
Use Openblas

Here is the code to reproduce it:

import breeze.linalg.DenseMatrix
import breeze.linalg.svd.SVD

object Test {

  def main(args : Array[String]) : Unit = {
   // System.setProperty("com.github.fommil.netlib.BLAS", "com.github.fommil.netlib.F2jBLAS")
   // System.setProperty("com.github.fommil.netlib.LAPACK", "com.github.fommil.netlib.F2jLAPACK")
   // System.setProperty("com.github.fommil.netlib.ARPACK", "com.github.fommil.netlib.F2jARPACK")


    val n = 1000
    val K = DenseMatrix.zeros[Float](n, n)
    val xs = (0 until 1000 by 10)
    for ((x, i) <- xs.zipWithIndex;
         (y, j) <- xs.zipWithIndex)
    {
      K(i,j) = Math.exp(- Math.pow(x-y, 2) / 1000).toFloat
    }
    K +=  DenseMatrix.eye[Float](n) * 1e-6f

    for (i <- (0 until 100)) {
      val SVD(u,d,v)= breeze.linalg.svd(K)
      println(s"u ${u.sum} d ${d.min} v ${v.sum} ")
    }
  }
}

And this is the output it produces:

...
u 916.70703 d 7.860268E-7 v 916.7068 
u 918.3815 d 6.009904E-7 v 918.38116 
u 916.70703 d 7.860268E-7 v 916.7069 
...

While this example uses float arrays, similar problems also occur when using double precision.
Do you have any idea what could be the problem?

Thanks,

Marcel

The text was updated successfully, but these errors were encountered:

fommil · 2015-04-16T14:35:57Z

interesting, we had a similar report at fommil/matrix-toolkits-java#64 which is ultimately using the same core libs.

Fundamentally, the results are always going to be prone to error (see my scalax talk's section about errors) and quality can vary wildly between implementations.

However, the more concerning thing from my perspective is if there is a multi-threaded aspect to these reports. The jury is still out, but it looks like some threads see the arrays in a different state. Try wrapping your tests in synchronized blocks to force a memory resync (equivalent to putting volatile on everything). ok you only have one thread, so that's irrelevant.

fommil · 2015-04-16T14:41:46Z

http://fommil.github.io/scalax14/#/ => go to video bit

in particular skip through to http://fommil.github.io/scalax14/#/6 (key down in the web slides)

marcelluethi · 2015-04-16T15:02:48Z

@fommil As you say, the problem can be reproduced from a single thread. But it seems to be more prominent if we run things in parallel. In one case (a large example that is too big to reproduce here), it gave horrible results when we run things in parallel and worked without problems when we run it single threaded. Does atlas also parallelize internally? Using libatlas3gf-base (on Ubuntu 14.04) also seems to be causing the problem.

fommil · 2015-04-16T15:13:22Z

the implementations most certainly will parallelise under the hood. If you are able to do some error tracking using the BLAS/LAPACK error tracing functions then it would be incredibly useful to confirm that the results are coming back within the tolerances claimed by the implementation.

Honestly, it is not surprising that you're seeing variation and I recommend using Intel MKL over ATLAS / OpenBLAS. If accuracy is more important to you than speed, then you might even want to look at using a specialist high precision implementation (e.g. the xxdg... libs) directly.

marcelluethi · 2015-04-17T06:41:20Z

Unfortunately, I will personally not find time to dig deep into the problem within the next weeks, but can follow it up a bit later if needed (from early June things should look better, time-wise).

Neither accuracy nor speed is a big issue for us at the moment, but the behaviour should be deterministic. The code is part of an evaluation metric that assesses the quality of a statistical model, and we realized that different runs for the same model resulted in very different metric values.
For the moment we will therefore use F2jblas to guarantee that we get consistent results.

BTW, thanks a lot for providing these libraries. So far they worked very well for us.

fommil · 2015-04-17T06:43:20Z

Ok, if you need determinism then you can't use any parallelism. Go with F2J.

Rounding errors alone are prone to ordering.

Closing, but I'd still love to hear the results of your analysis here when you do it.

fommil closed this as completed Apr 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic behaviour of SVD for some blas implementations #82

Nondeterministic behaviour of SVD for some blas implementations #82

marcelluethi commented Apr 16, 2015

fommil commented Apr 16, 2015

fommil commented Apr 16, 2015

marcelluethi commented Apr 16, 2015

fommil commented Apr 16, 2015

marcelluethi commented Apr 17, 2015

fommil commented Apr 17, 2015

Nondeterministic behaviour of SVD for some blas implementations #82

Nondeterministic behaviour of SVD for some blas implementations #82

Comments

marcelluethi commented Apr 16, 2015

fommil commented Apr 16, 2015

fommil commented Apr 16, 2015

marcelluethi commented Apr 16, 2015

fommil commented Apr 16, 2015

marcelluethi commented Apr 17, 2015

fommil commented Apr 17, 2015