ChatGLM2中的的那个使用 load_model_on_gpus 指定在多卡上做推理的方法适用于 ChatGLM3么? #861
-
我把ChatGLM2里面的 utils.py 复制过来,用load_model_on_gpus指定2个 GPU,用 cli_batch_demo做测试,发现总耗时没有变化,这个是因为 load_model_on_gpus 的方法不适用于 ChatGLM3,还是因为 cli_batch_demo 没法用多卡做推理? |
Beta Was this translation helpful? Give feedback.
Answered by
zRzRzRzRzRzRzR
Feb 24, 2024
Replies: 1 comment
-
可以,但是不如用auto,glm3现在device map = True就行了 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
zRzRzRzRzRzRzR
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
可以,但是不如用auto,glm3现在device map = True就行了