(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

arporter · 2024-10-03T09:12:02Z

No description provided.

codecov · 2024-10-03T13:45:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.86%. Comparing base (8367de1) to head (98daf23).

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2732    +/-   ##
========================================
  Coverage   99.86%   99.86%            
========================================
  Files         354      354            
  Lines       49010    49112   +102     
========================================
+ Hits        48946    49048   +102     
  Misses         64       64

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arporter · 2024-10-03T14:06:53Z

As well as addressing the coverage, I need to extend the LFRic integration tests to use this new functionality.

arporter · 2024-10-07T16:05:05Z

Added the KernelModuleInlineTrans back into the LFRic transformation script and get a crash when trying to add the interface symbol into the symbol table.

arporter · 2024-10-08T14:19:25Z

Build and run with updated transformation script works:

This doesn't seem very different from the profile of the version before this change so I need to check exactly what PSyclone is making of things.

sergisiso · 2024-10-08T14:52:25Z

If you merge with master, I added multiple prints in the output, so you can grep and count the number of each cases that we could not get the kernel_schedule before due to polymorphic kernels, which was 331

arporter · 2024-10-08T20:43:32Z

The OMP offload LFRic integration test failed with an ICE:

17:47:36 Pre-process and compile inventory/id_r32_field_array_pair_mod.F90
inventory/id_r32_field_array_pair_mod.F90:
NVFORTRAN-S-0000-Internal compiler error. flowgraph: node is zero       3  (lfric_xios_setup_mod_psy.f90: 85)
NVFORTRAN-F-0000-Internal compiler error. Invalid key for hash       0  (lfric_xios_setup_mod_psy.f90: 85)
NVFORTRAN/x86-64 Linux 24.5-1: compilation aborted

It's seems odd that there's a _psy.f90 for xios setup??

EDIT: L85 of that file corresponds to the end of the (module-inlined?) nodal_coordinates_code kernel (subroutine) which is called from within an offloaded region. However, the routine itself does not contain an offload directive?

arporter · 2024-10-08T21:22:05Z

Do the build manually for OpenACC (which doesn't give an ICE) and things are looking better:

(note: I've marked MATMUL as being available on the GPU now.)
I think @sergisiso has seen that module-inlining a routine is sufficient for the compiler to do the right thing, even if it's not had e.g. acc routine added to it?

arporter · 2024-10-08T21:32:41Z

I need to figure out why we get an inlined routine without an offload directive added to it. The build log says:

PSy name = 'lfric_xios_setup_mod_psy'
Transforming invoke 'invoke_0_nodal_xyz_coordinates_kernel_type' ...
Failed to annotate 'nodal_xyz_coordinates_code' with GPU-enabled directive due to:
Transformation Error: Kernel 'nodal_xyz_coordinates_code' accesses the symbol 'chi2xyz: RoutineSymbol<NoType, pure=unknown, elemental=unknown>' which is imported. If this symbol represents data then it must first be converted to a Kernel argument using the KernelImportsToArguments transformation.
Failed to annotate 'nodal_xyz_coordinates_code' with GPU-enabled directive due to:
Transformation Error: Kernel 'nodal_xyz_coordinates_code' accesses the symbol 'chi2xyz: RoutineSymbol<NoType, pure=unknown, elemental=unknown>' which is imported. If this symbol represents data then it must first be converted to a Kernel argument using the KernelImportsToArguments transformation.
Transforming invoke 'invoke_1_nodal_coordinates_kernel_type' ...
Failed to module-inline kernel 'nodal_coordinates_code' due to:
Transformation Error: Cannot inline subroutine 'nodal_coordinates_code' because another, different, subroutine with the same name already exists and versioning of module-inlined subroutines is not implemented yet.
Successfully offloaded loop with ['nodal_coordinates_code']
Successfully offloaded loop with ['nodal_coordinates_code']
Transforming invoke 'invoke_2_nodal_coordinates_kernel_type' ...
Failed to module-inline kernel 'nodal_coordinates_code' due to:
Transformation Error: Cannot inline subroutine 'nodal_coordinates_code' because another, different, subroutine with the same name already exists and versioning of module-inlined subroutines is not implemented yet.
Successfully offloaded loop with ['nodal_coordinates_code']
Successfully offloaded loop with ['nodal_coordinates_code']

arporter · 2024-10-09T08:04:27Z

Part of the problem was that the optimisation script wasn't checking that it hadn't already transformed a given kernel for a given invoke. Fixing that removes the 'failed to inline' messages but we still end up with an inlined kernel that doesn't have a directive added to it.

arporter · 2024-10-09T08:42:48Z

The problem was that, having successfully module-inlined the kernel routine, we proceed to apply the annotation transformation to the original Kern and that does not update the Routine that has been inlined. It probably should? For now, I can search for the newly inlined routine and apply the transformation to that and we get the expected code.

arporter · 2024-10-09T08:43:20Z

Now get a compilation failure:

NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure matrix_vector_code (
algorithm/norm_alg_mod_psy.f90: 338)

This seems to be because we have successfully inlined this interface and its routines but subsequent PSy-layer subroutines are still importing it from a module. In turn, this is because I now check whether or not we've already transformed a kernel of a given name. Essentially, we need multiple LFRicKern objects to all point to the same bit of PSyIR.

arporter · 2024-10-09T09:11:57Z

Kern.get_kernel_schedule() (which is to be replaced/removed if/when we migrate Kern to subclass Call) currently caches the PSyIR of the kernel. KernelModuleInlineTrans creates a copy of this PSyIR and then inserts it into the PSy-layer. Therefore, get_kernel_schedule() should now return that copy of the PSyIR. In fact, that would happen automatically if I undid my changes to KernelModuleInlineTrans so that it doesn't copy the PSyIR. I can't remember why I made that change.

I think I made that change because, without copying, a routine gets removed from its original Container but that breaks any Interface that refers to it. This becomes a problem in the (unlikely) event that a routine is called directly as well as via an interface.

arporter · 2024-10-10T09:16:10Z

If we have an Algorithm layer that contains two invokes that each call the same kernel then we attempt to apply KernelModuleInlineTrans to each kernel call. If the first one succeeds then we immediately (in global.py) proceed to add e.g. acc routine to it. That done, we move on to the next invoke where we try to do the same thing. We then find that the body of the routine we want to inline does not match the one we've already inlined because we've added acc routine. This results in:

The second kernel is not flagged as being module inlined and thus is handled by _rename_and_write.
The result of _rename_and_write is a renamed kernel for which the PSy layer does not contain a symbol.

This shouldn't happen. However, we also want to mark all kernels of the same name as being module-inlined and pointing to a single implementation. Since global.py works invoke by invoke, it's not simple (or natural) to alter it so that it works kernel by kernel.

…utines

arporter · 2024-10-10T13:10:39Z

Thinking about this a bit more, I belatedly realise that if a given PSy layer routine calls the same kernel more than once then either all of them must be module inlined or none of them (unless we attempt to rename the inlined version and that gets complicated). I think the reason that LFRic is still not working is that we fail to inline a second instance of the same kernel (because the first instance has been transformed) but then we do proceed to transform it. Therefore, rename_and_write() is called in order to generate a new version of that kernel on disk. I need a reproducer for this.

arporter · 2024-10-11T09:58:25Z

I've realised that Kern.module_inline sets this flag for all Kernels with that name in the current InvokeSchedule. This seems sensible but is causing me problems because we end up with a Kernel marked as 'module inlined' but which actually isn't because the transformation on it failed and so it was written to file and renamed instead.

The solution might therefore be as simple as not attempting to transform a Kernel that is already marked as module inlined.

arporter · 2024-10-11T11:04:07Z

The trouble with the LFRic example is that we have two, separate invoke calls, each with the same Kernel. The first Kernel gets module inlined but has to be modified to have some imports made local to it. That then means we refuse to inline the second Kernel because it is no longer identical. The question is, why does the search for the source of the second kernel not return the module-inlined source of the first one?

…ined

arporter added 13 commits September 17, 2024 17:11

#2716 add initial fix and test [skip ci]

3bbdef5

#2716 fix linting

459e26d

#2716 rm check for polymorphic kernels, ensure renamed kern is public

37188bf

#2716 fix linting

4aa1c0c

Merge branch 'master' into 2716_transform_interface_bug

dd29ca3

#2716 WIP exploring options

4383c7c

Merge branch 'master' into 2716_transform_interface_bug

60f4add

#2716 WIP plumbing-in inlining of multiple kernel routines

62b5da5

#2716 more fixes [skip ci]

5a10b88

Merge branch 'master' into 2716_transform_interface_bug

d217661

#2716 get KernelModuleInlineTrans tests working [skip ci]

cc657a9

#2716 fix linting

08f18c8

#2716 more linting

53147c6

arporter self-assigned this Oct 3, 2024

arporter marked this pull request as draft October 3, 2024 09:12

arporter added in progress NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH labels Oct 3, 2024

arporter added 5 commits October 3, 2024 11:46

#2716 fix a lot of tests

4198608

#2716 fix remaining tests

e079988

#2716 fix examples

ca153a4

Merge branch 'master' into 2716_transform_interface_bug

e35bcb2

#2716 revert some unnecessary changes

71a2630

arporter added 5 commits October 4, 2024 17:03

#2716 tidying and improving comments/docstrings

ba58c82

#2716 add tests for KernelModuleInlineTrans

2ac83b5

#2716 fix coverage of gocean_move_iteration_boundaries_inside

e5d5699

#2716 rm need for polymorphic checks for GOcean Kernels

f026e21

#2716 improve coverage

6925697

arporter and others added 4 commits October 8, 2024 11:57

#2716 add InterfaceDeclGen to f2pygen

a00f367

#2716 fixes for the transformation in LFRic

2b82201

Merge branch 'master' into 2716_transform_interface_bug

b9987fc

#2716 fix tests broken by merge

0fd9464

arporter added 3 commits October 8, 2024 15:54

#2716 update opt script in repo and fix OMPDeclareTargetTrans

2e76d3a

#2716 mark MATMUL as available on GPU

721ff5d

#2716 fix test for matmul on gpu

8319e95

arporter temporarily deployed to integration October 8, 2024 15:59 — with GitHub Actions Inactive

#2716 ensure Kern points to inlined PSyIR after transformation [skip ci]

33376ff

arporter added 3 commits October 10, 2024 11:18

#2716 improvements to validation of calls that resolve to multiple ro…

e8b3c0b

…utines

#2716 add new inlining test

0903b0a

#2716 add new test source file

a8d357d

arporter added 2 commits October 11, 2024 11:30

#2716 return early if PSyKAl kernel already module inlined

48bac41

Merge branch 'master' into 2716_transform_interface_bug

df74591

arporter and others added 3 commits October 11, 2024 20:50

#2716 improve apply() so that it returns early if routine already inl…

c26b8cc

…ined

#2716 update lfric inlining example (eg2)

4568467

Merge branch 'master' into 2716_transform_interface_bug

98daf23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

arporter commented Oct 3, 2024

codecov bot commented Oct 3, 2024 •

edited

Loading

arporter commented Oct 3, 2024

arporter commented Oct 7, 2024

arporter commented Oct 8, 2024 •

edited

Loading

sergisiso commented Oct 8, 2024

arporter commented Oct 8, 2024 •

edited

Loading

arporter commented Oct 8, 2024 •

edited

Loading

arporter commented Oct 8, 2024

arporter commented Oct 9, 2024

arporter commented Oct 9, 2024

arporter commented Oct 9, 2024 •

edited

Loading

arporter commented Oct 9, 2024 •

edited

Loading

arporter commented Oct 10, 2024 •

edited

Loading

arporter commented Oct 10, 2024

arporter commented Oct 11, 2024 •

edited

Loading

arporter commented Oct 11, 2024

(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

Are you sure you want to change the base?

(Closes #2716) Add support for module-inlining calls to polymorphic kernels/routines #2732

Conversation

arporter commented Oct 3, 2024

codecov bot commented Oct 3, 2024 • edited Loading

Codecov Report

arporter commented Oct 3, 2024

arporter commented Oct 7, 2024

arporter commented Oct 8, 2024 • edited Loading

sergisiso commented Oct 8, 2024

arporter commented Oct 8, 2024 • edited Loading

arporter commented Oct 8, 2024 • edited Loading

arporter commented Oct 8, 2024

arporter commented Oct 9, 2024

arporter commented Oct 9, 2024

arporter commented Oct 9, 2024 • edited Loading

arporter commented Oct 9, 2024 • edited Loading

arporter commented Oct 10, 2024 • edited Loading

arporter commented Oct 10, 2024

arporter commented Oct 11, 2024 • edited Loading

arporter commented Oct 11, 2024

codecov bot commented Oct 3, 2024 •

edited

Loading

arporter commented Oct 8, 2024 •

edited

Loading

arporter commented Oct 8, 2024 •

edited

Loading

arporter commented Oct 8, 2024 •

edited

Loading

arporter commented Oct 9, 2024 •

edited

Loading

arporter commented Oct 9, 2024 •

edited

Loading

arporter commented Oct 10, 2024 •

edited

Loading

arporter commented Oct 11, 2024 •

edited

Loading