Replies: 1 comment
-
it seems it's in the scheduler update class, it generates batches of output for the various prompt under process, but the sequence is generated in blocks, not one token at a time, so there may be some waste if a sequence doesn't match the constraints |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Just dug into the code and looks like the way to obtain forward logits is via the
llm.llm_engine.step()
. I am trying to integrate this api into outlines. But I feel I would need to manage things like unfinished requests and the scheduler.Is there a simple way to just call
model.forward()
ormodel.__call__()
equivalent in hf transformers?Beta Was this translation helpful? Give feedback.
All reactions