-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRA does not support Tesla P4 model GPUs because it does not support setting time slices by nvidia-smi #41
Comments
Hi. Sorry @wawa0210. We have been focussed on other development for the past couple of weeks. It may make sense to not trigger the |
It's called everytime at the moment to ensure that when sharing is not set, that it gets set to the default time slice (in case it had been set to something else previously). A better check might be to ensure that the architecture is Kepler+ before attempting to make the call. |
It seems that no accurate documentation has been found describing which architectures support time slice settings,Is there accurate information available for reference? |
okk |
When I ran DRA on the Tesla P4 node, I found that the pod failed to start.
environment
K8s version: v1.27.5
k8s-dra-driver: latest branch main
what happened
Deployment pod in Tesla P4 environment occupies one card and reports an error
dig found that when the GPU is set to not share,
nvidia-smi compute-policy -i uuid --set-timeslice 0
will still be set, but Tesla P4 does not support this command, so an error is reportedcode ref
k8s-dra-driver/cmd/nvidia-dra-plugin/sharing.go
Lines 99 to 120 in 702a05b
Steps to reproduce
Test yaml information
other information
NAS info
In this case, if sharing is not set, is it possible not to call the setTimeSlice method?
Looking forward to hearing from the community and then I can try to fix it
The text was updated successfully, but these errors were encountered: