Add compute_shared_memory_aggs used by shared memory groupby #17162

PointKernel · 2024-10-24T00:11:02Z

Description

This work is part of splitting the original bulk shared memory groupby PR #16619.

This PR introduces the compute_shared_memory_aggs API, which is utilized by the shared memory groupby. The shared memory groupby process consists of two main steps. The first step was introduced in #17147, and this PR implements the second step, where the actual aggregations are performed based on the offsets from the first step. Each thread block is designed to handle up to 128 unique keys. If this limit is exceeded, there won't be enough space to store temporary aggregation results in shared memory, so a flag is set to indicate that follow-up global memory aggregations are needed to complete the remaining aggregation requests.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

cpp/src/groupby/hash/single_pass_functors.cuh

…-aggs

cpp/src/groupby/hash/single_pass_functors.cuh

cpp/src/groupby/hash/compute_shared_memory_aggs.cu

davidwendt · 2024-10-25T18:25:58Z

cpp/src/groupby/hash/compute_shared_memory_aggs.cu

+
+  size_t dynamic_shmem_size = 0;
+  CUDF_CUDA_TRY(cudaOccupancyAvailableDynamicSMemPerBlock(
+    &dynamic_shmem_size, single_pass_shmem_aggs_kernel, active_blocks_per_sm, GROUPBY_BLOCK_SIZE));


I don't think this will change within a process. I wonder if it could be cached by making the variable static?

The return value of this function will be used in another TU as well: https://github.com/PointKernel/cudf/blob/ed9243b83181d15646b46e15e1aa42963131c5f6/cpp/src/groupby/hash/compute_aggregations.cuh#L69

I've updated the code to calculate the available shared memory only once in compute_aggregations and pass available_shmem_size as an argument, rather than repeatedly calling the API.

Co-authored-by: David Wendt <[email protected]>

…-aggs

PointKernel added 2 commits October 23, 2024 16:46

Add functors used by compute_shared_memory_aggs

df8a561

Add compute_shared_memory_aggs

ec5b705

PointKernel added the CMake CMake build issue label Oct 24, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 24, 2024

PointKernel added non-breaking Non-breaking change feature request New feature or request labels Oct 24, 2024

PointKernel marked this pull request as ready for review October 24, 2024 00:18

PointKernel requested review from a team as code owners October 24, 2024 00:18

PointKernel requested review from karthikeyann and kingcrimsontianyu October 24, 2024 00:18

PointKernel commented Oct 24, 2024

View reviewed changes

cpp/src/groupby/hash/single_pass_functors.cuh Outdated Show resolved Hide resolved

PointKernel self-assigned this Oct 24, 2024

PointKernel added 2 commits October 24, 2024 10:38

Update comments

5e85f23

Merge remote-tracking branch 'upstream/branch-24.12' into compute-shm…

1412330

…-aggs

PointKernel commented Oct 24, 2024

View reviewed changes

cpp/src/groupby/hash/single_pass_functors.cuh Show resolved Hide resolved

PointKernel added the 3 - Ready for Review Ready for review by team label Oct 24, 2024

davidwendt reviewed Oct 25, 2024

View reviewed changes

cpp/src/groupby/hash/compute_shared_memory_aggs.cu Outdated Show resolved Hide resolved

davidwendt reviewed Oct 25, 2024

View reviewed changes

cpp/src/groupby/hash/compute_shared_memory_aggs.cu Outdated Show resolved Hide resolved

davidwendt reviewed Oct 25, 2024

View reviewed changes

PointKernel and others added 6 commits October 25, 2024 11:32

Update cpp/src/groupby/hash/compute_shared_memory_aggs.cu

00ed4b2

Co-authored-by: David Wendt <[email protected]>

Merge remote-tracking branch 'upstream/branch-24.12' into compute-shm…

dd9e3fd

…-aggs

Leverage existing utilities to eliminate duplication

8c3f192

Minor cleanup

f0d9c5a

Fix typos

30decde

Pass available_shmem_size as argument to avoid redundant invocation

f117774

PointKernel requested a review from davidwendt October 25, 2024 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compute_shared_memory_aggs used by shared memory groupby #17162

Add compute_shared_memory_aggs used by shared memory groupby #17162

PointKernel commented Oct 24, 2024

davidwendt Oct 25, 2024 •

edited

Loading

PointKernel Oct 25, 2024

Add compute_shared_memory_aggs used by shared memory groupby #17162

Are you sure you want to change the base?

Add compute_shared_memory_aggs used by shared memory groupby #17162

Conversation

PointKernel commented Oct 24, 2024

Description

Checklist

davidwendt Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

PointKernel Oct 25, 2024

Choose a reason for hiding this comment

davidwendt Oct 25, 2024 •

edited

Loading