You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:All previous Gpus of the cluster were 515 version of the driver and cuda11.7.Rencently I add a machine with L20(only support driver 535 at least and cuda12, then I ran into a problem that the gpus were not recognized correctly:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg
Environment:
HAMi version:
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others:
The text was updated successfully, but these errors were encountered:
What happened:All previous Gpus of the cluster were 515 version of the driver and cuda11.7.Rencently I add a machine with L20(only support driver 535 at least and cuda12, then I ran into a problem that the gpus were not recognized correctly:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
uname -a
The text was updated successfully, but these errors were encountered: