When trying to scale to two nodes, I get "Error ignored in is_in_the_same_node: [../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:133] " #6585
Unanswered
carljones3000
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I'm try to test scaling to two nodes.
I have output from two tests here, both of them lead to the "Error ignored in is_in_the_same_node: [../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:133] "
Test 1:
This is trying to run on two nodes, 4 3090 GPU's each, 8 GPU's total.
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen1.5-32B-Chat --tensor-parallel-size 8 --max-model-len 12591 --swap-space 0
Test 2:
This is trying to run on two nodes, 1 3090 GPU each, 2 GPU's total.
python -m vllm.entrypoints.openai.api_server --model facebook/opt-13b --tensor-parallel-size 2
Any help would be appreciated.
-Carl
Test 1:
Test 2:
Beta Was this translation helpful? Give feedback.
All reactions