"PLZ!"Is possible to customize the request schedule strategy "continuous batching" ? #9616
Unanswered
Noblezhong
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I am a graduate students who interested in request schedule strategy of LLM. I have surveyed many framwork like Orca, Deepspeed, Tensorrt-llm. But I found these inference framework don't share how to implement their request strategy(e.g. iteration-levle schedule, dynamic batching etc.). So I turn to vLLM which also has a similar schedule strategy outperforming traditional static batching. But I read the documention found there are no tutorial teach me how to implement and customize it. What's more when I seek answer in 'issue' part, it seems that the continuous batching is enabled by default and has no chance to degrade to static batching.
So I wonder if there any demo or tutorial build for continuous batching, or just how to customize this excellent strategy. SRY I am a freshman in both vLLM and LLM inference. orz
Beta Was this translation helpful? Give feedback.
All reactions