You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a particular vLLM test on an A100 GPU, flashinfer appears to be generating nans under a specific scenario. The test fails under a specific scenario on an A100 while passing all scenarios on both an H100 and an L4. We are using flashinfer-0.1.6+cu124torch2.4.
The failure scenario is when three of the parameters are three specific values at the same time:
block_size = 32
head_size = 256
num_heads = (32, 8)
32 gets assigned to num_query_heads
8 gets assigned to num_kv_heads
If any of these parameters is one of the other possible values, the test will pass on the A100.
The failure message seems to indicate that, under this scenario, nans are being generated:
AssertionError: Tensor-likes are not close!
Mismatched elements: 1024 / 24576 (4.2%)
Greatest absolute difference: nan at index (0, 0, 0) (up to 0.02 allowed)
Greatest relative difference: nan at index (0, 0, 0) (up to 0.01 allowed)
The general error message is the same between failures; the only variations are the total number of elements (either 24576 or 32768; the number of mismatched elements is always 1024) or the index (it is either (0, 0, 0) or (3, 0, 0)).
The text was updated successfully, but these errors were encountered:
When running a particular vLLM test on an A100 GPU,
flashinfer
appears to be generatingnan
s under a specific scenario. The test fails under a specific scenario on an A100 while passing all scenarios on both an H100 and an L4. We are usingflashinfer-0.1.6+cu124torch2.4
.The test that fails is test_flashinfer_decode_with_paged_fp8_kv.
The failure scenario is when three of the parameters are three specific values at the same time:
num_query_heads
num_kv_heads
If any of these parameters is one of the other possible values, the test will pass on the A100.
The failure message seems to indicate that, under this scenario,
nan
s are being generated:The general error message is the same between failures; the only variations are the total number of elements (either 24576 or 32768; the number of mismatched elements is always 1024) or the index (it is either
(0, 0, 0)
or(3, 0, 0)
).The text was updated successfully, but these errors were encountered: