You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attempted to use the offline inference method to run the meta-llama/Meta-Llama-3.1-70B-Instruct model. However, after the program starts running, it hangs for a long time and then throws the following error. I can run the model in the form of an OpenAI server with tensor parallel.
(RayWorkerWrapper pid=18930) *** SIGABRT received at time=1724981511 on cpu 144 ***
(RayWorkerWrapper pid=18930) PC: @ 0x7fa8c2bd89fc (unknown) pthread_kill
(RayWorkerWrapper pid=18930) @ 0x7fa8c2b84520 (unknown) (unknown)
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440: *** SIGABRT received at time=1724981511 on cpu 144 ***
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440: PC: @ 0x7fa8c2bd89fc (unknown) pthread_kill
(RayWorkerWrapper pid=18930) [2024-08-30 01:31:51,420 E 18930 20862] logging.cc:440: @ 0x7fa8c2b84520 (unknown) (unknown)
(RayWorkerWrapper pid=18930) Fatal Python error: Aborted
Here is my code:
importosimportloggingfromvllmimportLLM, SamplingParamsos.environ["PT_HPU_ENABLE_LAZY_COLLECTIVES"] ="true"prompts= [
"The president of the United States is",
"The capital of France is",
]
sampling_params=SamplingParams(n=1, temperature=0, max_tokens=2000)
llm=LLM(model="meta-llama/Meta-Llama-3.1-70B-Instruct", block_size=128, dtype="bfloat16", tensor_parallel_size=8)
outputs=llm.generate(prompts, sampling_params)
# Print the outputs.foroutputinoutputs:
prompt=output.promptgenerated_text=output.outputs[0].textprint(generated_text)
The text was updated successfully, but these errors were encountered:
Your current environment
docker: vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest
branch: habana_main
🐛 Describe the bug
I attempted to use the offline inference method to run the
meta-llama/Meta-Llama-3.1-70B-Instruct
model. However, after the program starts running, it hangs for a long time and then throws the following error. I can run the model in the form of an OpenAI server with tensor parallel.Here is my code:
The text was updated successfully, but these errors were encountered: