You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then I changed the atol to 1e-4 and 1e-3, the tests failed!
So I wonder, is it normal to have such a big numerical difference between vllm cuda operators and torch operators?
That's why when I used baichuan-13B to run the model test script, it failed and I found that the front part of the two generated sentences are the same but diverged in the middle.
I guess it's because the numerical difference will become bigger and bigger with the inference steps?
So is there any way to solve this problem?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey guys! Have you ever noticed that the outputs from vllm and hf is unalign even if you use the greedy search strategy?
I tried to run the test scripts with pytest in vllm project:
https://github.com/vllm-project/vllm/blob/main/tests/kernels/test_attention.py
https://github.com/vllm-project/vllm/blob/main/tests/kernels/test_layernorm.py
yes, they were all passed.
then I found the assert expression is like this:
assert torch.allclose(output, ref_output, atol=1e-3, rtol=1e-5)
assert torch.allclose(out, ref_out, atol=1e-2, rtol=1e-5)
Then I changed the atol to 1e-4 and 1e-3, the tests failed!
So I wonder, is it normal to have such a big numerical difference between vllm cuda operators and torch operators?
That's why when I used baichuan-13B to run the model test script, it failed and I found that the front part of the two generated sentences are the same but diverged in the middle.
I guess it's because the numerical difference will become bigger and bigger with the inference steps?
So is there any way to solve this problem?
Beta Was this translation helpful? Give feedback.
All reactions