[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize #512

yiakwy-xpu-ml-framework-team · 2024-09-27T09:12:54Z

MappingUtils has been interagrated into in ROCM SDK 6.2, which defines coordinates <waveRows, waveCols> in the form of

blockDim = (waveRows * warpSize, waveCols) // warpSize is 64 in AMD GPU, and 32 in NVGPU

<waveRows, waveCols> warp coordinates in each threads block distributed to each SM(NV)/CUs(AMD).

This feature can eliminate hard coded warp size, and partition hirearchy transformation, which relies on HW memory hirearchy and make sure codes work correctly cross platform.

Note partition hirearchy transformation , HW memory hirearchy can changes with hardware. For example L2 cache may have different memory banks (4 banks) than LDS (64 banks), that means the best (if exist) swizzling technology super parameters for memory level_{i} is different from memroy level_{i+1}.

The codes of MappingUtils for a threads block looks like:

    template <uint32_t BlockHeight, uint32_t BlockWidth, typename DataT, typename DataLayout>
    struct MappingUtil {
        static inline uint32_t laneId();
        
        //  Local wave coordinate relative to workgroup, above example <waveRows, waveCols> for warp level programming with warp sync API
        static inline WaveCoordT WaveCoordT waveCoord();
 
        // Global block (grid) coordinate of current wave
        static inline BlockCoordT blockCoord();
 
        // Matrix coordinate of current wave
        static inline MatrixCoordT matrixCoord();
    }

Morover, the warp size partition is dependent on the instruction used.

For example, the partition for instruct m8n8.x4 ( 8x8 matrix fragment x 4) instruction must be different from instruct m16n16.x4.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize #512

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize #512

yiakwy-xpu-ml-framework-team commented Sep 27, 2024 •

edited

Loading

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize #512

[Feat] Suggest of using MappingUtils to compute coordiantes automatically for different warpSize #512

Comments

yiakwy-xpu-ml-framework-team commented Sep 27, 2024 • edited Loading

yiakwy-xpu-ml-framework-team commented Sep 27, 2024 •

edited

Loading