You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Minimal serial benchmark against Keras + Tensorflow is my paper from earlier this year, and shows comparable serial performance in terms of elapsed time, and superior performance in terms of memory use: https://arxiv.org/abs/1902.06714.
This is highly encouraging because:
neural-fortran still uses the simplest out-of-the-box implementation (standard matmul, no special tricks for optimization etc.), while Tensorflow is mature and highly optimized.
It wasn't obvious how to control the number of shared-memory cores for Tensorflow. Google and TF docs failed to provide immediate answers. While it was possible to contrain CPU use to 100%, allowing it to use more would always saturate to ~170%, even on a 12-core machine. In contrast, running neural-fortran on 1200% CPU was trivial thanks to its transparent parallel model.
Further improvement in performance can be done for example by taking a few approaches from #18. I will consider them for neural-fortran.
For example Keras + Tensorflow.
The text was updated successfully, but these errors were encountered: