-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clubb_intr GPUization #1175
base: cam_development
Are you sure you want to change the base?
clubb_intr GPUization #1175
Conversation
…e options from running when compiled with OpenACC
… by breaking BFBness. All these values are ~0 but not always exactly.
…t and BFB on CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I had some questions and change requests but none of them are required, and of course if you have any concerns with any of my requests then just let me know. Thanks!
call init_pdf_implicit_coefs_terms_api( pverp+1-top_lev, ncol, sclr_dim, & | ||
pdf_implicit_coefs_terms_chnk(lchnk) ) | ||
end if | ||
! Initialize physics tendency arrays, copy the state to state1 array to use in this routine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might change this comment to just say:
! Initialize physics tendency arrays, copy the state to state1 array to use in this routine | |
! Initialize physics tendency arrays |
As you have another comment below in the location where you are copying the state.
! Determine number of vertical levels used in clubb, thermo variables are nzt_clubb | ||
! and momentum variables are nzm_clubb | ||
nzt_clubb = pver + 1 - top_lev | ||
nzm_clubb = pverp + 1 - top_lev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this will impact the GPU-performance, but from a readability standpoint I wonder if it makes sense to make these two quantities module-level variables, set them once at the start of clubb_ini_cam
, similar to where nlev
was, and then use them everywhere you are otherwise doing pver + 1 - top_lev
and pverp + 1 - top_lev
.
This would also be beneficial because these quantities will never actually change during a CAM run, so they only need to be set once.
wp3(1:ncol,pverp) = wp3(1:ncol,pver) | ||
up2(1:ncol,pverp) = up2(1:ncol,pver) | ||
vp2(1:ncol,pverp) = vp2(1:ncol,pver) | ||
! Initialize the apply_const variable (note special logic is due to eularian backstepping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
! Initialize the apply_const variable (note special logic is due to eularian backstepping) | |
! Initialize the apply_const variable (note special logic is due to eulerian backstepping) |
! to zero. | ||
fcor(:) = 0._r8 | ||
! Set the ztodt timestep in pbuf for SILHS | ||
ztodtptr(:) = 1.0_r8*hdtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a question for @Katetc, but why multiply by one instead of just using hdtime
directly?
!$acc state1, state1%q, state1%u, state1%v, state1%t, state1%pmid, state1%s, state1%pint, & | ||
!$acc state1%zm, state1%zi, state1%pdeldry, state1%pdel, state1%omega, state1%phis, & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious as to why you have to bring in state1
in addition to the relevant state1
variables (e.g. state1%q
)?
|
||
! Compute exner at the surface for converting the sensible heat fluxes | ||
! to a flux of potential temperature for use as clubb's boundary conditions | ||
inv_exner_clubb_surf(i) = 1._r8 / ( ( state1%pmid(i,pver) / p0_clubb )**( rairv(i,pver,lchnk) / cpairv(i,pver,lchnk) ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just do this instead (?):
inv_exner_clubb_surf(i) = 1._r8 / ( ( state1%pmid(i,pver) / p0_clubb )**( rairv(i,pver,lchnk) / cpairv(i,pver,lchnk) ) ) | |
inv_exner_clubb_surf(i) = inv_exner_clubb(i,pver) |
Of course if this changes answers on CPUs then feel free to ignore this request.
do i=1, ncol | ||
clubbtop(i) = top_lev | ||
do while ((rtp2(i,clubbtop(i)) <= 1.e-15_r8 .and. rcm(i,clubbtop(i)) == 0._r8) .and. clubbtop(i) < pver) | ||
clubbtop(i) = clubbtop(i) + 1 | ||
clubbtop(i) = clubbtop(i) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the indent here?
This only modifies clubb_intr.F90 and doesn't require a new verseion of clubb. The purpose of this is the addition of
acc
directives, added in order to offload computations to GPUs. Besides the directives, this mainly consists of replacing vector notation with explicit loops, combining loops with the same bounds where possible, and moving non-gpuized function calls to outside of the GPU section. I also added some new notation for the number of vertical levels (nzm_clubb
andnzt_clubb
) that improves readability and will make it easier to merge in with future versions of clubb. I also included some timing statements, similar to the ones added in the Earthworks ew-develop branch, which this version of clubb_intr is also compatible with.This is BFB on CPUs (tested with intel), runs with intel+debugging, and passes the ECT test when comparing CPU results to GPU results (using cam7). There's some options that I didn't GPUize or test (
do_clubb_mf
,do_rainturb
,do_cldcool
,clubb_do_icesuper
,single_column
), so I left the code for them untouched and added some checks to stop the run if they're set when the code is compiled with OpenACC.If there ends up being something wrong with these changes then this version, which is an earlier commit that contains only a new OpenACC data statement and some timer additions, would be nice to get in at least.