New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

clubb_intr GPUization #1175

Open

huebleruwm wants to merge 12 commits into ESCOMP:cam_development from huebleruwm:clubb_intr_gpuization

huebleruwm commented Oct 18, 2024

This only modifies clubb_intr.F90 and doesn't require a new verseion of clubb. The purpose of this is the addition of acc directives, added in order to offload computations to GPUs. Besides the directives, this mainly consists of replacing vector notation with explicit loops, combining loops with the same bounds where possible, and moving non-gpuized function calls to outside of the GPU section. I also added some new notation for the number of vertical levels (nzm_clubb and nzt_clubb) that improves readability and will make it easier to merge in with future versions of clubb. I also included some timing statements, similar to the ones added in the Earthworks ew-develop branch, which this version of clubb_intr is also compatible with.

This is BFB on CPUs (tested with intel), runs with intel+debugging, and passes the ECT test when comparing CPU results to GPU results (using cam7). There's some options that I didn't GPUize or test (do_clubb_mf, do_rainturb, do_cldcool, clubb_do_icesuper, single_column ), so I left the code for them untouched and added some checks to stop the run if they're set when the code is compiled with OpenACC.

If there ends up being something wrong with these changes then this version, which is an earlier commit that contains only a new OpenACC data statement and some timer additions, would be nice to get in at least.

huebleruwm added 12 commits

October 11, 2024 22:28


          Manually merging in changes from ew-develop

37430e1


          Adding acc data statement outside substepping loop and preventing som…

…e options from running when compiled with OpenACC


          Replacing vector notation with loops, rearranging some things, genera…

6a38ec2

…l cleanup.


          More small changes

94ba629


          Some GPUization.

7b94963


          Small bug fixes

7786cca


          Tweaks to acc data statement. This version passed the GPU ect test.

0d391b1


          More GPUization and small changes. ECT passed on CPU and GPU.

cf58834


          Bug fix that scarily did not break the CPU ECT test, but was detected…

e39e8a3

… by breaking BFBness. All these values are ~0 but not always exactly.


          More GPUization. ECT passes for CPU and GPU, CPU is BFB.

e5e81fb


          Adding/moving timing statements

87af330


          Small cleanup and a little more GPUization. Passes ECT CPU vs GPU tes…

c85778e

…t and BFB on CPUs.

Katetc self-requested a review

October 21, 2024 21:43

cacraigucar requested a review from nusbaume

October 22, 2024 15:10

nusbaume approved these changes

View reviewed changes

Collaborator

nusbaume left a comment

Looks good to me! I had some questions and change requests but none of them are required, and of course if you have any concerns with any of my requests then just let me know. Thanks!

src/physics/cam/clubb_intr.F90

-                     call init_pdf_implicit_coefs_terms_api( pverp+1-top_lev, ncol, sclr_dim, &
-                                                             pdf_implicit_coefs_terms_chnk(lchnk) )
-                   end if
+                  !  Initialize physics tendency arrays, copy the state to state1 array to use in this routine

Collaborator

nusbaume Oct 23, 2024

I might change this comment to just say:

Suggested change

      
                !  Initialize physics tendency arrays, copy the state to state1 array to use in this routine
          
                !  Initialize physics tendency arrays

As you have another comment below in the location where you are copying the state.

src/physics/cam/clubb_intr.F90

Comment on lines +2769 to +2772

+                  ! Determine number of vertical levels used in clubb, thermo variables are nzt_clubb
+                  ! and momentum variables are nzm_clubb
+                  nzt_clubb = pver + 1 - top_lev
+                  nzm_clubb = pverp + 1 - top_lev

Collaborator

nusbaume Oct 23, 2024

I don't know if this will impact the GPU-performance, but from a readability standpoint I wonder if it makes sense to make these two quantities module-level variables, set them once at the start of clubb_ini_cam, similar to where nlev was, and then use them everywhere you are otherwise doing pver + 1 - top_lev and pverp + 1 - top_lev.

This would also be beneficial because these quantities will never actually change during a CAM run, so they only need to be set once.

src/physics/cam/clubb_intr.F90

-                    wp3(1:ncol,pverp)     = wp3(1:ncol,pver)
-                    up2(1:ncol,pverp)     = up2(1:ncol,pver)
-                    vp2(1:ncol,pverp)     = vp2(1:ncol,pver)
+                  ! Initialize the apply_const variable (note special logic is due to eularian backstepping)

Collaborator

nusbaume Oct 23, 2024

Typo:

Suggested change

      
                ! Initialize the apply_const variable (note special logic is due to eularian backstepping)
          
                ! Initialize the apply_const variable (note special logic is due to eulerian backstepping)

src/physics/cam/clubb_intr.F90

-                  ! to zero.
-                  fcor(:) = 0._r8
+                  !  Set the ztodt timestep in pbuf for SILHS
+                  ztodtptr(:) = 1.0_r8*hdtime

Collaborator

nusbaume Oct 23, 2024

Maybe this is a question for @Katetc, but why multiply by one instead of just using hdtime directly?

src/physics/cam/clubb_intr.F90

Comment on lines +2863 to +2864

		!$acc state1, state1%q, state1%u, state1%v, state1%t, state1%pmid, state1%s, state1%pint, &
		!$acc state1%zm, state1%zi, state1%pdeldry, state1%pdel, state1%omega, state1%phis, &

Collaborator

nusbaume Oct 24, 2024 •

edited

Loading

Just curious as to why you have to bring in state1 in addition to the relevant state1 variables (e.g. state1%q)?

src/physics/cam/clubb_intr.F90

+                    !  Compute exner at the surface for converting the sensible heat fluxes
+                    !  to a flux of potential temperature for use as clubb's boundary conditions
+                    inv_exner_clubb_surf(i) = 1._r8 / ( ( state1%pmid(i,pver) / p0_clubb )**( rairv(i,pver,lchnk) / cpairv(i,pver,lchnk) ) )

Collaborator

nusbaume Oct 25, 2024

Couldn't you just do this instead (?):

Suggested change

      
                  inv_exner_clubb_surf(i) = 1._r8 / ( ( state1%pmid(i,pver) / p0_clubb )**( rairv(i,pver,lchnk) / cpairv(i,pver,lchnk) ) )
          
                  inv_exner_clubb_surf(i) = inv_exner_clubb(i,pver)

Of course if this changes answers on CPUs then feel free to ignore this request.

src/physics/cam/clubb_intr.F90

                   do i=1, ncol
                     clubbtop(i) = top_lev
                     do while ((rtp2(i,clubbtop(i)) <= 1.e-15_r8 .and. rcm(i,clubbtop(i))  ==  0._r8) .and. clubbtop(i) <  pver)
-                      clubbtop(i) = clubbtop(i) + 1
+                     clubbtop(i) = clubbtop(i) + 1

Collaborator

nusbaume Oct 25, 2024

Why remove the indent here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet