-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NemoLite2D benchmarking results on HPC-level hardware #41
Comments
Currently PSyclone can generate an OpenACC version. In both OpenACC and OpenMP we have yet to really 'go to town' to see how well we can do - we've only done fairly vanilla implementations. If we're going to write a paper then, in keeping with our self-proclaimed "era of performance" we will want to do better! (i.e. we don't want "it works but it's slow".) We have a manual MPI version working. No PSyclone support for that yet (it's on @rupertford's list :-) ). GPU-wise, I think SKFP and glados have the same V100s and therefore we can just use one or other of them. Currently the OpenCL we generate is aimed at FPGA. There may be some infrastructure work to do in order to get it working on the GPU (although, now I come to think about it, I think @sergisiso has run on a GPU recently so we may be OK). Finally, and slightly bigger-picture, we need to think how this relates to 'the PSyclone paper' that we've been threatening to write for about 2 years... It feels like there's a lot to discuss... |
Leaving the paper considerations aside, I think being able to generate this performance snapshots programatically were it records the commit/compiler-version/architecture would be very useful and I have been trying to start this with the common makefile infrastructure. In #37 the compiler column of make summary also includes the version. I have been experimenting with providing architecture details in the table as well. But the table has some limitations in the number of fields it remains readable. So we may need something else that store big tables in a file (with flags, parameters, ...) Regarding point 4, it will be good to add is a Makefile rule or a common script to provide scalability tables (which also include the mentioned parameters) and maybe adding PU, something like:
And then a gnuplot can easily draw plots form some of this tables. @LonelyCat124 If that will be useful for you we can coordinate this work. Regarding OpenCL, it can run on CPU, GPU and FPGA, and return correct results, but I am not claiming yet that it does a sensible thing in each platform :) |
That would probably be useful for most of these - I could probably do something like that even for Regent, though I don't have the checksums implemented yet (I probably should do this soon). Maybe a script would be easier? If we always save the executable to I'm personally not a fan of gnuplot plots (vs matplotlib) 😂 but happy to go with them if everyone else prefers them. |
I like gnuplot for its ubiquity and speed of use. I've never managed to get paper-quality images out of it though so am happy to use something else. (I like xmgrace but that's showing my age.) Although I'm all in favour of automation where possible, I don't think we should get too hung up on it if it proves complicated (especially once batch systems become involved). The key thing is to capture all the necessary data in one place and in a format we can plot. |
I think with python it could be pretty straightforward to have something that you can send into bsub/qsub/whatever and go from there (as opposed to running on the top level), or even just a bash script for most of it except maybe plotting depending on whats used. Once I've finished my non-ECP project properly I could have a go if noone else wants to bite the bullet |
I think I (and @sergisiso ?) may be reaching the point where we're reaching some level of maturity in the newer versions of the NemoLite2D, and I think it would be best for us to begin to collate results. As previously discussed I think an idea was a paper discussing comparative benchmarks of various parallel systems applied to the NemoLite2D benchmark.
As far as I understand it, we have the following versions:
Manual versions:
PSYclone generated:
My proposal would be then the following benchmarking results:
Fortran OpenMP version with gcc9 on Skylake
Fortran OpenMP version with intel/? on Skylake
Fortran Serial version with gcc9 on Skylake
Fortran Serial version with intel/? on Skylake
Regent version on Skylake
C++ OpenMP version with gcc9 on Skylake
C++ OpenMP version with Intel on Skylake
C++ Kokkos version on Skylake
PSYclone OpenMP generated version on Skylake
OpenCL version on appropriate hardware (@arporter) - I assume we want both the GPU in Glados and on ScafellPike? FPGA also an option.
OpenACC version on GPU
PSYclone generated OpenACC version on GPU
I think what we should record for each set of results is:
install.py
)Does this all seem reasonable?
The text was updated successfully, but these errors were encountered: