Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo gpu sharing for mps does not start inferencing after downloading pytorch_model.bin #56

Open
ltson4121994 opened this issue Jul 30, 2024 · 0 comments

Comments

@ltson4121994
Copy link

ltson4121994 commented Jul 30, 2024

Starting Prometheus server on port 8000...
Running benchmark...
Downloading (…)cessor_config.json";: 100%|██████████| 292/292 [00:00<00:00, 27.0kB/s]
Downloading (…)"config.json";: 100%|██████████| 4.13k/4.13k [00:00<00:00, 244kB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████| 123M/123M [11:58<00:00, `171kB/s]

The line Running inference... is not printed out so I assume there is some problem when the model is loaded to GPU. Here is the MPS server log:

==> /tmp/nvidia-mps/server.log <==
[2024-07-30 02:31:59.303 Other   138] Initializing server process
[2024-07-30 02:31:59.339 Server   138] Creating server context on device 0 (NVIDIA GeForce RTX 2080 Ti)
[2024-07-30 02:31:59.401 Server   138] Creating server context on device 1 (NVIDIA GeForce RTX 2080 Ti)
[2024-07-30 02:31:59.456 Server   138] Created named shared memory region /cuda.shm.3e8.8a.1

==> /tmp/nvidia-mps/control.log <==
[2024-07-30 02:31:59.456 Control    58] NEW SERVER 138: Ignoring connection from user

==> /tmp/nvidia-mps/server.log <==
[2024-07-30 02:31:59.456 Server   138] Active Threads Percentage set to 0.0
[2024-07-30 02:32:36.506 Server   138] Server Priority set to 0
[2024-07-30 02:32:36.506 Server   138] Server has started
[2024-07-30 02:32:36.506 Server   138] Destroy server context on device 0
[2024-07-30 02:32:36.545 Server   138] Destroy server context on device 1

==> /tmp/nvidia-mps/control.log <==
[2024-07-30 02:32:36.581 Control    58] Server 138 exited with status 0
[2024-07-30 02:32:36.581 Control    58] Starting new server 144 for user 1000

==> /tmp/nvidia-mps/server.log <==
[2024-07-30 02:32:36.601 Other   144] Startup
[2024-07-30 02:32:36.601 Other   144] Connecting to control daemon on socket: /tmp/nvidia-mps/control

==> /tmp/nvidia-mps/control.log <==
[2024-07-30 02:32:36.601 Control    58] Accepting connection...

==> /tmp/nvidia-mps/server.log <==
[2024-07-30 02:32:36.601 Other   144] Initializing server process
[2024-07-30 02:32:36.641 Server   144] Creating server context on device 0 (NVIDIA GeForce RTX 2080 Ti)
[2024-07-30 02:32:36.704 Server   144] Creating server context on device 1 (NVIDIA GeForce RTX 2080 Ti)
[2024-07-30 02:32:36.768 Server   144] Created named shared memory region /cuda.shm.3e8.90.1

==> /tmp/nvidia-mps/control.log <==
[2024-07-30 02:32:36.768 Control    58] NEW SERVER 144: Ready

==> /tmp/nvidia-mps/server.log <==
[2024-07-30 02:32:36.768 Server   144] Active Threads Percentage set to 100.0
[2024-07-30 02:32:36.768 Server   144] Server Priority set to 0
[2024-07-30 02:32:36.768 Server   144] Server has started
[2024-07-30 02:32:36.768 Server   144] Received new client request
[2024-07-30 02:32:36.799 Server   144] Worker created
[2024-07-30 02:32:36.799 Server   144] Creating worker thread
[2024-07-30 02:32:36.799 Server   144] Waiting for current clients to finish

==> /tmp/nvidia-mps/control.log <==
[2024-07-30 02:32:36.847 Control    58] Accepting connection...
[2024-07-30 02:32:36.848 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:37:55.850 Control    58] Accepting connection...
[2024-07-30 02:37:55.850 Control    58] User did not send valid credentials
[2024-07-30 02:37:55.850 Control    58] Accepting connection...
[2024-07-30 02:37:55.851 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:41:25.952 Control    58] Accepting connection...
[2024-07-30 02:41:25.952 Control    58] User did not send valid credentials
[2024-07-30 02:41:25.952 Control    58] Accepting connection...
[2024-07-30 02:41:25.952 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:42:55.872 Control    58] Accepting connection...
[2024-07-30 02:42:55.872 Control    58] User did not send valid credentials
[2024-07-30 02:42:55.872 Control    58] Accepting connection...
[2024-07-30 02:42:55.872 Control    58] NEW CLIENT 0 from user 0: Server is not ready, push client to pending list
[2024-07-30 02:49:23.964 Control    58] Accepting connection...
[2024-07-30 02:49:23.964 Control    58] User did not send valid credentials
[2024-07-30 02:49:23.964 Control    58] Accepting connection...
[2024-07-30 02:49:23.964 Control    58] NEW CLIENT 0 from user 0: Server is not ready, push client to pending list
[2024-07-30 02:50:09.170 Control    58] Accepting connection...
[2024-07-30 02:50:09.247 Control    58] User did not send valid credentials
[2024-07-30 02:50:09.247 Control    58] Accepting connection...
[2024-07-30 02:50:09.247 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:51:05.370 Control    58] Accepting connection...
[2024-07-30 02:51:05.370 Control    58] User did not send valid credentials
[2024-07-30 02:51:05.370 Control    58] Accepting connection...
[2024-07-30 02:51:05.370 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:52:51.748 Control    58] Accepting connection...
[2024-07-30 02:52:51.749 Control    58] User did not send valid credentials
[2024-07-30 02:52:51.749 Control    58] Accepting connection...
[2024-07-30 02:52:51.749 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:54:55.658 Control    58] Accepting connection...
[2024-07-30 02:54:55.658 Control    58] User did not send valid credentials
[2024-07-30 02:54:55.658 Control    58] Accepting connection...
[2024-07-30 02:54:55.658 Control    58] NEW CLIENT 0 from user 1000: Server is not ready, push client to pending list
[2024-07-30 02:57:06.983 Control    58] Accepting connection...
[2024-07-30 02:57:06.984 Control    58] User did not send valid credentials
[2024-07-30 02:57:06.984 Control    58] Accepting connection...
[2024-07-30 02:57:06.984 Control    58] NEW CLIENT 0 from user 0: Server is not ready, push client to pending list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant