Is it compatible with different driver versions and cuda versions #482

15929482853 · 2024-09-09T08:08:23Z

What happened:All previous Gpus of the cluster were 515 version of the driver and cuda11.7.Rencently I add a machine with L20(only support driver 535 at least and cuda12, then I ran into a problem that the gpus were not recognized correctly:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg

Environment:

HAMi version:
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others:

The text was updated successfully, but these errors were encountered:

archlitchi · 2024-09-10T09:57:10Z

can you re-submit the task with env 'CUDA_DISABLE_CONTROL'=true , and see if it reproduces this error?

15929482853 added the kind/bug Something isn't working label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it compatible with different driver versions and cuda versions #482

Is it compatible with different driver versions and cuda versions #482

15929482853 commented Sep 9, 2024

archlitchi commented Sep 10, 2024 •

edited

Loading

Is it compatible with different driver versions and cuda versions #482

Is it compatible with different driver versions and cuda versions #482

Comments

15929482853 commented Sep 9, 2024

archlitchi commented Sep 10, 2024 • edited Loading

archlitchi commented Sep 10, 2024 •

edited

Loading