Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using tensor parallel during offline inference causes the process to hang #220

Open
xinsu626 opened this issue Aug 30, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@xinsu626
Copy link

Your current environment

docker: vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
branch: habana_main

🐛 Describe the bug

I attempted to use the offline inference method to run the meta-llama/Meta-Llama-3.1-70B-Instruct model. However, after the program starts running, it hangs for a long time and then throws the following error. I can run the model in the form of an OpenAI server with tensor parallel.

(RayWorkerWrapper pid=18930) *** SIGABRT received at time=1724981511 on cpu 144 ***
(RayWorkerWrapper pid=18930) PC: @     0x7fa8c2bd89fc  (unknown)  pthread_kill
(RayWorkerWrapper pid=18930)     @     0x7fa8c2b84520  (unknown)  (unknown)
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440: *** SIGABRT received at time=1724981511 on cpu 144 ***
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440: PC: @     0x7fa8c2bd89fc  (unknown)  pthread_kill
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440:     @     0x7fa8c2b84520  (unknown)  (unknown)
(RayWorkerWrapper pid=18930) Fatal Python error: Aborted

Here is my code:

import os
import logging
from vllm import LLM, SamplingParams

os.environ["PT_HPU_ENABLE_LAZY_COLLECTIVES"] = "true"

prompts = [
    "The president of the United States is",
    "The capital of France is",
]

sampling_params = SamplingParams(n=1, temperature=0, max_tokens=2000)
llm = LLM(model="meta-llama/Meta-Llama-3.1-70B-Instruct", block_size=128, dtype="bfloat16", tensor_parallel_size=8)
outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(generated_text)
@xinsu626 xinsu626 added the bug Something isn't working label Aug 30, 2024
@michalkuligowski
Copy link

Possible duplicate of #197

@michalkuligowski
Copy link

Testing possible fix #379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants