You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you please provide the exact hami image version to help trace the specific code line? It currently appears that certain map-type fields in the scheduler might be accessed concurrently without locks, causing a fatal error: concurrent map iteration and map write
@jeonghyunkeem Got it, I checked, and I know where the problem is. This issue has already been fixed in #418, so it should no longer occur if you use the latest version, 2.4.0.
What happened:
vgpu-scheduler-extender
container (part ofhami-scheduler
pod) keeps terminated with exit code 2.What you expected to happen:
vgpu-scheduler-extender
stays alive without terminationHow to reproduce it (as minimally and precisely as possible): I'm not sure as it happens randomly
Anything else we need to know?:
I'm using multiple gpu nodes in my cluster and each node has
hami.io/node-nvidia-register
annotation as follows:nvidia-smi -a
on your host/etc/docker/daemon.json
)here are the final logs of terminated
vgpu-scheduler-extender
container:sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
uname -a
The text was updated successfully, but these errors were encountered: