-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD integration #132
base: master
Are you sure you want to change the base?
AMD integration #132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at a high level. Thanks for your work!
I think 0.5 seconds is just right, assuming it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Just a couple nits :)
zeus/device/gpu/amd.py
Outdated
"""Check if the GPU supports retrieving total energy consumption. Returns a future object of the result.""" | ||
wait_time = 0.5 # seconds | ||
threshold = 0.8 # 80% threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to first check whether self._supportsGetTotalEnergyConsumption is not None
and return the cached value for future invocations of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the current structure it's never called more than once. But that would be better, I'll add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is part of the public API of GPU
and GPUs
, meaning people can always call it in their code. Thanks!
Co-authored-by: Jae-Won Chung <[email protected]>
@parthraut Is this ready for review? Did stuff work properly on the AMD HPC cluster? |
A few questions:
zeus/zeus/device/gpu/amd.py:247:13 - error: Operator "*" not supported for types "c_uint32" and "Literal[1000]" (reportOperatorIssue) zeus/zeus/device/gpu/amd.py:258:9 - error: Method "supportsGetTotalEnergyConsumption" overrides class "GPU" in an incompatible manner Positional parameter count mismatch; base method has 1, but override has 2 (reportIncompatibleMethodOverride) zeus/zeus/device/gpu/amd.py:280:26 - error: Cannot assign to attribute "_supportsGetTotalEnergyConsumption" for class "AMDGPU*" Attribute "_supportsGetTotalEnergyConsumption" is unknown (reportAttributeAccessIssue) zeus/zeus/device/gpu/amd.py:282:26 - error: Cannot assign to attribute "_supportsGetTotalEnergyConsumption" for class "AMDGPU*" Attribute "_supportsGetTotalEnergyConsumption" is unknown (reportAttributeAccessIssue) zeus/zeus/device/gpu/amd.py:293:26 - error: Cannot assign to attribute "_supportsGetTotalEnergyConsumption" for class "AMDGPU*" Attribute "_supportsGetTotalEnergyConsumption" is unknown (reportAttributeAccessIssue) zeus/zeus/device/gpu/amd.py:344:19 - error: Cannot access attribute "value" for class "AmdSmiException" Attribute "value" is unknown (reportAttributeAccessIssue) zeus/zeus/device/gpu/amd.py:346:37 - error: Cannot access attribute "msg" for class "AmdSmiException"
I can fix these, but do you think this is the right approach? I wanted to make sure before tweaking base class signatures.