Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generated OpenACC and OpenMP #12

Open
arporter opened this issue Jun 5, 2018 · 5 comments
Open

Auto-generated OpenACC and OpenMP #12

arporter opened this issue Jun 5, 2018 · 5 comments
Assignees

Comments

@arporter
Copy link
Member

arporter commented Jun 5, 2018

In PSyclone issue 170 (stfc/PSyclone#170) we are adding support for OpenACC to the GOcean 1.0 API. The NEMOLite2D application in PSycloneBench requires some updating in order for this to work and we shall do that here.

@arporter
Copy link
Member Author

arporter commented Jun 5, 2018

Auto-generation of the OpenACC version of NEMOLite2D is now working.

@arporter
Copy link
Member Author

I've updated the OpenACC transformation script so that it now adds the !$acc routine directive to every kernel. I've also updated the Makefile to ensure that it builds all of the generated kernels. This was difficult and will have broken it for all the other targets as I've altered the way we get the list of kernel names.
The generated code compiles without OpenACC enabled. However, when it is enabled the code will not compile because some of the kernels use modules from the GOcean infrastructure (in order to get some parameter values).

@arporter
Copy link
Member Author

Compiling the manual OpenACC version with 18.4 of PGI at -O2 fails with:

pgf90 -O2 -g -Minfo=all -acc -ta=tesla:cc70 -I../../../../../shared
/dl_esm_inf/finite_difference/src -I../../../../../shared/dl_timer/src -c initialisation_mod.f90
initialisation:
     27, Memory set idiom, loop replaced by call to __c_mset8
     28, Memory set idiom, loop replaced by call to __c_mset8
     29, Memory set idiom, loop replaced by call to __c_mset8
     32, Memory zero idiom, loop replaced by call to __c_mzero8
     37, FMA (fused multiply-add) instruction(s) generated
     49, FMA (fused multiply-add) instruction(s) generated
     59, Memory zero idiom, loop replaced by call to __c_mzero8
     62, Memory zero idiom, loop replaced by call to __c_mzero8
/tmp/pgf90ftFbpOhEjuSF.s: Assembler messages:
/tmp/pgf90ftFbpOhEjuSF.s:3549: Error: unsupported instruction `vmovd'
/tmp/pgf90ftFbpOhEjuSF.s:3599: Error: unsupported instruction `vmovd'
/tmp/pgf90ftFbpOhEjuSF.s:3649: Error: unsupported instruction `vmovd'
make[1]: *** [Makefile:70: initialisation_mod.o] Error 2

but compiling at -O1 works. According to pgfortran -help -O2, -O2 == -Mvect=sse -Mcache_align -Mpre so I tried doing -O2 -Mnovect -Mnocache_align -Mnopre but that didn't change the error.
I was hoping to get -O2 working because I wanted to see whether IPA would remove the need for us to module-inline accelerated kernels. However, requesting IPA automatically ups the optimisation level to -O2 and the compiler falls over.

@sergisiso sergisiso changed the title Auto-generated OpenACC Auto-generated OpenACC and OpenMP Jan 11, 2023
@sergisiso
Copy link
Collaborator

I will continue this old issue to psyclone-generate the OpenACC and OpenMP versions, which are both almost there.

@arporter
Copy link
Member Author

arporter commented Feb 1, 2024

As discussed, I've just tried turning OpenACC on for #100 and it very nearly works - the resulting Fortran doesn't compile though. The only reason for this is that the module-inlined versions of the bc_flather kernels still have a wildcard import from a module which brings in g which is now being passed as an argument to the kernel (thanks to KernelImportsToArguments). I've stepped through the latter transformation in the debugger and it does remove the Container symbol from the Kernel table so I don't understand how it appears in the generated Fortran.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants