RuntimeError: Engine loop has died #9418

jovi-s · 2024-10-16T11:20:36Z

jovi-s
Oct 16, 2024

Hi everyone,
I am trying to perform inference using TheBloke/Mistral-7B-Instruct-v0.2-AWQ with vLLM Installation with CPU using Docker and I keep receiving this error:

CRITICAL 10-16 09:58:33 launcher.py:99] MQLLMEngine is already dead, terminating server process

I am successful in building the CPU Docker image, and when using the default facebook/opt-125m model, I am also successful in running the server and performing inference to receive a completions response.

As for the Mistral model, AWQ is a part of the supported hardware for quantization kernels, and I am able to start the server with my Docker run command as follows:

docker run -it --rm -v Mistral:/mnt/models/Mistral --network=host --ipc=host -e VLLM_CPU_KVCACHE_SPACE=40 vllm-cpu-env --model="/mnt/models/Mistral/Mistral-7B-Instruct-v0.2-AWQ" --dtype="half" --quantization awq --device "cpu" --max-model-len 2048

When I send an inference query, I am also able to see the following log:

INFO engine.py:292] Added request cmpl-4f556df75d1f4acd96e83932879a8273-0

It is only after a few seconds that I receive RuntimeError('Engine loop has died') which kills the server, and shuts down the Docker container.

I have tried various parameters of the VLLM_CPU_KVCACHE_SPACE value and have increased VLLM_ENGINE_ITERATION_TIMEOUT_S, as well as setting VLLM_CPU_OMP_THREADS_BIND to my physical cores, but to no avail.

I'm reaching out in the hopes that this error can be rectified. Thank you for your attention thus far. Cheers

AMohamedAakhil · 2024-10-21T02:34:22Z

AMohamedAakhil
Oct 21, 2024

I'm having the same issue

1 reply

Chengyiao0730 Oct 22, 2024

me too. i use model qwen2.5 in Ubuntu24.04.I am trying to switch between the model and the virtual environment But it still reports an error.

ERROR 10-22 12:20:26 client.py:250] RuntimeError('Engine loop has died')
ERROR 10-22 12:20:26 client.py:250] Traceback (most recent call last):
ERROR 10-22 12:20:26 client.py:250] File "/home/chengyiao/miniconda3/envs/cls3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop

SusanLiu0709 · 2024-10-24T10:35:26Z

SusanLiu0709
Oct 24, 2024

Same error here.

1 reply

gxm651182644 Oct 28, 2024

same to me
model :Qwen2-VL-72B-Instruct-GPTQ-Int4
env : (A10 24g )* 8
scripts:
vllm serve /model-repo/Qwen2-VL-72B-Instruct-GPTQ-Int4 --max-model-len 32768 --served-model-name Qwen2-VL-72B-Instruct-GPTQ-Int4 --device cuda --tensor-parallel-size $gpus --limit_mm_per_prompt 'image=10' --gpu_memory_utilization 0.8

vllm server can run successfully
but when i curl model concurrently
RuntimeError: Engine loop has died

colinTmx · 2024-10-31T08:18:44Z

colinTmx
Oct 31, 2024

same to me
model:qwen2.5-32b-instruct-gptq-int3

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Engine loop has died #9418

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

RuntimeError: Engine loop has died #9418

jovi-s Oct 16, 2024

Replies: 3 comments · 2 replies

AMohamedAakhil Oct 21, 2024

Chengyiao0730 Oct 22, 2024

SusanLiu0709 Oct 24, 2024

gxm651182644 Oct 28, 2024

colinTmx Oct 31, 2024

jovi-s
Oct 16, 2024

Replies: 3 comments 2 replies

AMohamedAakhil
Oct 21, 2024

SusanLiu0709
Oct 24, 2024

colinTmx
Oct 31, 2024