-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOS MPS leaves GPUs on node in exclusive mode #27
Comments
You can try to add a shutdown command to the set-compute-mode container. |
When able I will add a preStop hook to the container and test if this resolves the issue. |
Have you seen this MR ? NVIDIA/k8s-device-plugin#490 |
@Baenimyr Good that the device plugin supports MPS now. The problem is that it does not scale dynamically. Of course, NOS could use the NVIDIA plugin now. However, with the NVIDIA DRA driver on the horizon, it does not make sense for me personally to use NOS. |
In my use-case I am often enabling and disabling NOS on individual nodes by adding/removing the label
nos.nebuly.com/gpu-partitioning=mps
. After labeling the node, NOS will change the GPU mode to exclusive. However, after removing the label, the GPU remains in exclusive mode.Expected behavior: NOS should revert the GPU mode to whatever it was when it started or to default.
Workaround: Change back to default mode (or whatever mode you want) after removing the label. Do this for all GPUs. For example, to change the mode on GPU 0 back to default use the following.
The text was updated successfully, but these errors were encountered: