You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, in some cases, we have bound threadIdx.x with an extent of 32 in other blocks, which leads to contradictions.
Another way is to fuse i and j, then bind fused loop to threads in a warp:
Nevertheless, in this case, the LowerThreadAllReduce pass cannot recognize the sub-warp reduction structure and will emit code that reduces all threads in a warp together.
There should be an abstraction to denote sub-warp reduction in TensorIR.
The text was updated successfully, but these errors were encountered:
In apache/tvm#10207 we introduce sub-warp reduction.
User can use 1 warp(32 threads) to perform eight 4-element aggregations in parallel:
However, in some cases, we have bound
threadIdx.x
with an extent of 32 in other blocks, which leads to contradictions.Another way is to fuse
i
andj
, then bind fused loop to threads in a warp:Nevertheless, in this case, the LowerThreadAllReduce pass cannot recognize the sub-warp reduction structure and will emit code that reduces all threads in a warp together.
There should be an abstraction to denote sub-warp reduction in TensorIR.
The text was updated successfully, but these errors were encountered: