Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit envvar to control if we mask /proc/driver/nvidia/params #198

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

klueska
Copy link
Collaborator

@klueska klueska commented Oct 29, 2024

No description provided.

@klueska klueska requested a review from elezar October 29, 2024 14:16
@@ -35,6 +35,7 @@ selectorLabelsOverride: {}
allowDefaultNamespace: false

deviceClasses: ["gpu", "mig", "imex"]
maskNvidiaDriverParams: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to add a description to the value to call out that this is for testing under kind with multiple nodes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should this be a top-level option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add some comments around this in the values file. Should I remove the comments in the template file then or have them in both places?

Regarding top-level option vs. not -- I thought about that and don't really have a strong preference. It's only relevant on nodes that have GPUs (i.e. the kubeletplugin in this component), but all of the other options there are native k8s primitives, so it felt weird to add this one off one that was not. What is your preference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that a user would first inspect the values file to determine what can be set and not necessarily go as deep as the template file.

I would summarise it as follows: the comment in the template file is for us, and the comment in the values file is for users.

What is your preference?

I prefer grouping options for readability, but I can also see that doing this early is not ideal -- especially if we don't have something that makes sense as a logical unit. Let's leave it top-level for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment in the template to be more geared towards us rather than the user.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some documentation on the helm value would be useful.

@klueska klueska force-pushed the explicit-envvar-for-mask branch 2 times, most recently from 0ebb992 to 8f9ea13 Compare October 30, 2024 08:38
Comment on lines +79 to +73
- name: MASK_NVIDIA_DRIVER_PARAMS
value: "{{ .Values.maskNvidiaDriverParams }}"
Copy link
Member

@elezar elezar Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one final question would be whether we want to inject this conditionally -- i.e. only if the value is specified and have nil as a default? Not a blocker though.

The one disadvantage of this is that the type of the setting is not apparent from reading the default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its necessary. If / when we do that we should do it for all options to be symmetrical.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my remaining comments are minor.

Thanks @klueska.

@klueska klueska merged commit 32805fe into NVIDIA:main Oct 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants