"PLZ!"Is possible to customize the request schedule strategy "continuous batching" ? #9616

Noblezhong · 2024-10-23T13:55:06Z

Noblezhong
Oct 23, 2024

Hi, I am a graduate students who interested in request schedule strategy of LLM. I have surveyed many framwork like Orca, Deepspeed, Tensorrt-llm. But I found these inference framework don't share how to implement their request strategy(e.g. iteration-levle schedule, dynamic batching etc.). So I turn to vLLM which also has a similar schedule strategy outperforming traditional static batching. But I read the documention found there are no tutorial teach me how to implement and customize it. What's more when I seek answer in 'issue' part, it seems that the continuous batching is enabled by default and has no chance to degrade to static batching.

So I wonder if there any demo or tutorial build for continuous batching, or just how to customize this excellent strategy. SRY I am a freshman in both vLLM and LLM inference. orz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"PLZ!"Is possible to customize the request schedule strategy "continuous batching" ? #9616

{{title}}

Replies: 0 comments

Select a reply

"PLZ!"Is possible to customize the request schedule strategy "continuous batching" ? #9616

Noblezhong Oct 23, 2024

Replies: 0 comments

Noblezhong
Oct 23, 2024