How to config the gpu utilization rate after build. #524

zch-cc · 2023-07-18T23:23:47Z

zch-cc
Jul 18, 2023

Hi team, thanks for making this product and I am glad to see huggingface text-generation-inference(tgi) has support on it. For my understanding, they only use vllm vllm_cache_ops and vllm_attention_ops. However, the vllm reserved a lot of gpu memory beforehand making the model out of memory. I wonder if there is any config I can made after the build, or not through the model LLM interface to specify the memory usage or gpu utilization rate? Thanks

zhuohan123 · 2023-07-20T00:23:20Z

zhuohan123
Jul 20, 2023
Maintainer

You can use --gpu-memory-utilization option at runtime to control the GPU memory utilization.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to config the gpu utilization rate after build. #524

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to config the gpu utilization rate after build. #524

zch-cc Jul 18, 2023

Replies: 1 comment

zhuohan123 Jul 20, 2023 Maintainer

zch-cc
Jul 18, 2023

zhuohan123
Jul 20, 2023
Maintainer