-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support skip old architecture version GPU settings time slice #58
base: main
Are you sure you want to change the base?
Conversation
friendly ping Can you help me update this form?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
Instead of making this change in:
func (l deviceLib) setTimeSlice(uuids []string, timeSlice int) error {
We could perform the checks in:
func (t *TimeSlicingManager) SetTimeSlice(devices *PreparedDevices, config *nascrd.TimeSlicingConfig) error {
This already checks for MIG devices and adding a loop over the devices
to filter out those that do not support timeslicing would be simpler here instead of rediscovering the available devices.
Furthermore, PreparedDevices
already stores the brand
and other relevant information.
Another quesiton that I have is what we expect the behavior to be when timeslicing is configured on a GPU that doesn't support it? Do we expect an error to be raised, or do we want to continue assuming blocking operation if the GPU is shared?
I think we should error out and not start the plugin. |
f08aeb2
to
0dc4244
Compare
Do you mean that a gpu card that does not support settings time slice will not be able to use this function and will error out ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
It's definitely cleaner to do this at the higher level.
Also note, that we could consider just existing early if timeslicing is not supported by at least one device.
b46450f
to
a67b164
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the modifications.
It would be good to get @klueska's thoughts on my suggestions too.
@wawa0210 as already requested, it would be better if we return an error if timeslicing is configured on a GPU that doesn't support it. This will align with what is done for MIG. I understand that crashing the plugin at this stage may not be desireable, but that is a separate issue that we would need to address. |
Agree with you, updated,maybe we should discuss this scenario in a separate issue --> #81 |
Signed-off-by: wawa0210 <[email protected]>
@elezar PTAL |
The code base has undergone a major overhaul to accommodate the API changes in Kubernetes v1.31. I imagine this code needs a rebase. |
/kind feature
fix #41
Flexible judgment based on the GPU architecture series. If the model does not support time slices, ignore the settings to ensure normal operation of the function.
Test Results
Tesla P4 can run demo1 normally
https://github.com/NVIDIA/k8s-dra-driver/blob/main/demo/specs/quickstart/gpu-test1.yaml