Replies: 6 comments 1 reply
-
Yes, I have also encountered the same prompt, the same sentence will be generated |
Beta Was this translation helpful? Give feedback.
-
Yes, I've encountered the same problem too. I would also like to know how to solve it. |
Beta Was this translation helpful? Give feedback.
-
Same here. |
Beta Was this translation helpful? Give feedback.
-
#590 To be clear, the repetitive/potential degradation issue is not related to special tokens such as |
Beta Was this translation helpful? Give feedback.
-
I have seen this behavior myself. It's blocking me. |
Beta Was this translation helpful? Give feedback.
-
I also noticed this issue. I think it can be solved by someone implementing the option for a HF sampler. It should be possible to match exact performance. |
Beta Was this translation helpful? Give feedback.
-
Hi guys, great work!
I have been experimenting with the library for several weeks, and immediately noticed that sampled tokens (with the same temperature and such) are significantly more deterministic with Vllm vs. HF Transformers using the same models - with temperature lower than 0.7, often the first 5-10 sampled tokens are exactly same across few different generations, even recreating the original text in the datasets verbatim, like there is some greedy decoding going on (when it is not). This unfortunately leads to a significant repetition issue I've never seen with HF.
Initially I thought this has to do with special tokens (such as
</s>
) but it was not the case.In the mean time, I have also been checking and modifying codebase to see if there is any discrepancy between the sampling process, but I am not sure about the difference.
Does anyone else experienced a similar behavior?
Might be related: #450
Beta Was this translation helpful? Give feedback.
All reactions