Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some mysterious precision loss going on #823

Closed
adampauls opened this issue Nov 8, 2021 · 10 comments
Closed

Some mysterious precision loss going on #823

adampauls opened this issue Nov 8, 2021 · 10 comments

Comments

@adampauls
Copy link
Contributor

After upgrading to Breeze 2.0, we have found some strange issues where we need to switch from Float to Double to get the right accuracy where we really shouldn't. Here is a somewhat minimized example:

object Test {
  def main(args: Array[String]): Unit = {
    import breeze.linalg._
    import breeze.stats.distributions.RandBasis
    val outputDim = 2
    val inputDim = 4
    val projection =
      convert(DenseMatrix.rand(outputDim, inputDim, RandBasis.withSeed(1).gaussian) / math.sqrt(outputDim), Float)
    val data = DenseMatrix.eye[Float](inputDim)
    val transformed = projection * data
    val corrMatrixFloat = transformed.t * transformed
    val transformedDouble = convert(transformed, Double)
    val corrMatrixDouble = convert(transformedDouble.t * transformedDouble, Float)
    println((corrMatrixDouble - corrMatrixFloat).toString(1000, 1000))
/* prints
-2.9802322E-8  0.0       -7.4505806E-9  0.0  
0.0            1.195261  1.020851       0.0  
-7.4505806E-9  0.0       0.0            0.0  
0.0            0.0       0.0            0.0 
*/
  }
}

As far as I can tell, entry (2, 2) in corrMatrixFloat is just entirely wrong in a way that should not happen because of precision issues.

I'm on a 2019 Macbook Pro.

@adampauls
Copy link
Contributor Author

Hmm, I suppose this could be caused by cancellation in issues in more complex matrix multiplication algorithms that I've never really understood. Is this just expected behavior of the underlying native libraries?

@adampauls
Copy link
Contributor Author

adampauls commented Nov 9, 2021

I should add that this is a regression: it does not happen on 1.2, but happens on 1.3 and 2.0.

@dlwh
Copy link
Member

dlwh commented Nov 11, 2021

I can't repro on linux, and I don't have a macos dev environment these days...

This is what I get:

-2.9802322E-8  0.0           -2.2351742E-8  1.4901161E-8  
0.0            1.1920929E-7  0.0            0.0           
-2.2351742E-8  0.0           0.0            1.1920929E-7  
1.4901161E-8   0.0           1.1920929E-7   0.0           

Given that this is in 1.3 and 2.0, I suspect this is an issue with the new netlib library we're using. IIRC basically the only change from 1.2 to 1.3 was the switch to the new netlib library. (Tagging @luhenry so it's on his radar.)

Can you see if you can repro on Linux (with CI or whatever)? If you can't repro on Linux we can proceed to minimizing it to a sequence of BLAS calls...

@adampauls
Copy link
Contributor Author

adampauls commented Nov 11, 2021

A few new bits of information:
• This bug is probably relevant luhenry/netlib#6.
• The native libraries are not loading:

Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$NativeBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$NativeBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.ForeignLinkerBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$BLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.NativeBLAS
Nov 11, 2021 8:00:10 AM dev.ludovic.netlib.InstanceBuilder$JavaBLAS getInstanceImpl
WARNING: Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS

• A colleague was able to reproduce on windows, though for him the VectorBLAS warning did not appear. I tried making sure that VectorBLAS was loading and it also did not fix the issue.

So I think this points to a bug in the Java implementations of some of the BLAS routines? I'll post an issue in netlib.

@adampauls
Copy link
Contributor Author

This is definitely a bug in netlib: luhenry/netlib#7

@luhenry
Copy link
Contributor

luhenry commented Nov 15, 2021

This should be fixed in https://github.com/luhenry/netlib/releases/tag/v2.2.1. I've just pushed the release through Sonatype, so it should show up in Maven repositories in the next hour or so at https://repo1.maven.org/maven2/dev/ludovic/netlib/blas/2.2.1/

@dlwh
Copy link
Member

dlwh commented Nov 15, 2021 via email

@dlwh dlwh closed this as completed in db5827d Nov 15, 2021
@dlwh
Copy link
Member

dlwh commented Nov 16, 2021

2.0.1-RC1 is being pushed to maven now, lemme know if it fixes it

@delenius
Copy link

delenius commented Dec 17, 2021

I'm still seeing the first 3 of the warnings above on 2.0.1-RC1 (not sure about the precision loss). I don't see the VectorBLAS warning. I am on Mac OS.

@dlwh
Copy link
Member

dlwh commented Dec 17, 2021

The warning just means you're not getting natives, so degraded performance, but it shouldn't mean anything for the precision thing. Not sure what's going on though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants