Useful kernels for parallel programming.
ScanKernel
implements prefix sum for uint32_t
values.
CompactKernel
implements stream compaction for values of user-specified size.
RadixSortKernel
implements radix sort for uint32_t
values. (WIP. Not yet optimized.)