Intel Xeon CPU Max 9480 (OpenVINO) performance? #9486
Unanswered
randomqhacker
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Getting less than 4 tok/s on a 78B Qwen2.5 finetune quantized to ~43G on 1 x Intel Xeon CPU Max 9480 (HBM only) with OpenVINO backend. pcm-memory confirms that HBM throughput reaches ~170GB/s. My understanding is the practical HBM read bandwidth limit on these should be closer to 575GB/s.
Also tested Qwen2.5 7B int8, and got ~21 tok/s at ~155GB/s HBM throughput.
Is this typical throughput with vLLM + OpenVINO on Xeon CPU Max? Or Xeon in general? Any tips?
Beta Was this translation helpful? Give feedback.
All reactions