NemoLite2D benchmarking results on HPC-level hardware #41

LonelyCat124 · 2020-06-09T14:32:58Z

arporter · 2020-06-09T14:43:20Z

Currently PSyclone can generate an OpenACC version. In both OpenACC and OpenMP we have yet to really 'go to town' to see how well we can do - we've only done fairly vanilla implementations. If we're going to write a paper then, in keeping with our self-proclaimed "era of performance" we will want to do better! (i.e. we don't want "it works but it's slow".)

We have a manual MPI version working. No PSyclone support for that yet (it's on @rupertford's list :-) ).

GPU-wise, I think SKFP and glados have the same V100s and therefore we can just use one or other of them. Currently the OpenCL we generate is aimed at FPGA. There may be some infrastructure work to do in order to get it working on the GPU (although, now I come to think about it, I think @sergisiso has run on a GPU recently so we may be OK).

Finally, and slightly bigger-picture, we need to think how this relates to 'the PSyclone paper' that we've been threatening to write for about 2 years... It feels like there's a lot to discuss...

sergisiso · 2020-06-09T16:11:51Z

Leaving the paper considerations aside, I think being able to generate this performance snapshots programatically were it records the commit/compiler-version/architecture would be very useful and I have been trying to start this with the common makefile infrastructure. In #37 the compiler column of make summary also includes the version.

I have been experimenting with providing architecture details in the table as well. But the table has some limitations in the number of fields it remains readable. So we may need something else that store big tables in a file (with flags, parameters, ...)

Regarding point 4, it will be good to add is a Makefile rule or a common script to provide scalability tables (which also include the mentioned parameters) and maybe adding PU, something like:

Implementation         PU   Compiler        Arch    ua checksum     uv checksum     time/step
psykal_omp:             1      gcc-7.4        cpu     0.41022150E-01  0.50252378E+01  0.35053E-01
psykal_omp:             2      
psykal_omp:             4      
psykal_omp:             8

And then a gnuplot can easily draw plots form some of this tables. @LonelyCat124 If that will be useful for you we can coordinate this work.

Regarding OpenCL, it can run on CPU, GPU and FPGA, and return correct results, but I am not claiming yet that it does a sensible thing in each platform :)

LonelyCat124 · 2020-06-09T16:30:55Z

That would probably be useful for most of these - I could probably do something like that even for Regent, though I don't have the checksums implemented yet (I probably should do this soon). Maybe a script would be easier? If we always save the executable to nemolite2d.exe (or similar, which regent can also do now I worked out how) we could point it at a directory and it could run the appropriate executable, and move/create namelists accordingly for different sizes too.

I'm personally not a fan of gnuplot plots (vs matplotlib) 😂 but happy to go with them if everyone else prefers them.

arporter · 2020-06-10T07:54:41Z

I like gnuplot for its ubiquity and speed of use. I've never managed to get paper-quality images out of it though so am happy to use something else. (I like xmgrace but that's showing my age.) Although I'm all in favour of automation where possible, I don't think we should get too hung up on it if it proves complicated (especially once batch systems become involved). The key thing is to capture all the necessary data in one place and in a format we can plot.

LonelyCat124 · 2020-06-10T09:04:15Z

I think with python it could be pretty straightforward to have something that you can send into bsub/qsub/whatever and go from there (as opposed to running on the top level), or even just a bash script for most of it except maybe plotting depending on whats used. Once I've finished my non-ECP project properly I could have a go if noone else wants to bite the bullet

LonelyCat124 added the enhancement New feature or request label Jun 9, 2020

LonelyCat124 added this to the Comparative Benchmarking milestone Jun 9, 2020

LonelyCat124 assigned sergisiso and LonelyCat124 Jun 9, 2020

arporter assigned rupertford Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NemoLite2D benchmarking results on HPC-level hardware #41

NemoLite2D benchmarking results on HPC-level hardware #41

LonelyCat124 commented Jun 9, 2020 •

edited

Loading

arporter commented Jun 9, 2020

sergisiso commented Jun 9, 2020 •

edited

Loading

LonelyCat124 commented Jun 9, 2020 •

edited

Loading

arporter commented Jun 10, 2020

LonelyCat124 commented Jun 10, 2020

NemoLite2D benchmarking results on HPC-level hardware #41

NemoLite2D benchmarking results on HPC-level hardware #41

Comments

LonelyCat124 commented Jun 9, 2020 • edited Loading

arporter commented Jun 9, 2020

sergisiso commented Jun 9, 2020 • edited Loading

LonelyCat124 commented Jun 9, 2020 • edited Loading

arporter commented Jun 10, 2020

LonelyCat124 commented Jun 10, 2020

LonelyCat124 commented Jun 9, 2020 •

edited

Loading

sergisiso commented Jun 9, 2020 •

edited

Loading

LonelyCat124 commented Jun 9, 2020 •

edited

Loading