-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B #12136
Comments
Vllm 0.5.4 does not support qwen2-vl model yet. We will support it in the future 0.6.1 version. |
Thank you! But I need double confirm, I use ipex to run Qwen2-VL-2B, not OpenVINO, vLLM 0.5.4 not support, right? |
Yes, even the official version of vllm 0.5.4 does not support it until 0.6.1. |
Thanks again. |
It is recommended to run Llama Qwen and chatglm models. |
Hi, |
We are validating 0.6.2 version, and Qwen2-VL model, will notify you once it's ready. Thanks. |
when I run vLLM model like Qwen2-VL-2B with ARC770 on MTL platform, will report error message as below:
RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 6.10 GiB (GPU 0; 15.11 GiB total capacity; 4.84 GiB already allocated; 5.41 GiB reserved in total by PyTorch)
The text was updated successfully, but these errors were encountered: