Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add memory usage variables for use on derecho #190

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jedwards4b
Copy link
Collaborator

Adds two new variables for memory usage control on derecho:
MEM_PER_TASK and MAX_MEM_PER_NODE.

@jedwards4b jedwards4b self-assigned this Oct 9, 2024
@@ -13,6 +13,8 @@
<BATCH_SYSTEM>pbs</BATCH_SYSTEM>
<SUPPORTED_BY>cseg</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE>
<MEM_PER_TASK>10</MEM_PER_TASK>
<MAX_MEM_PER_NODE>235</MAX_MEM_PER_NODE>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MAX_MEM_PER_NODE can increase to 470 for a GPU node on Derecho. Maybe add the gpu_type="!none" attribute for the value of a GPU node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to fine tune this for gpu nodes - currently mem usage on gpu nodes is hardcoded to 470 and this PR won't change that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the memory usage on a GPU node is hardcoded to 480 now. I just wonder if it can be replaced by the MAX_MEM_PER_NODE variable here as well, but a different value based on whether it is a CPU or GPU node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it can - is it an issue we need to worry about?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is an issue to worry about and we can address it later if it becomes a problem.

@@ -13,6 +13,8 @@
<BATCH_SYSTEM>pbs</BATCH_SYSTEM>
<SUPPORTED_BY>cseg</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>128</MAX_TASKS_PER_NODE>
<MEM_PER_TASK>10</MEM_PER_TASK>
<MAX_MEM_PER_NODE>235</MAX_MEM_PER_NODE>
<MAX_GPUS_PER_NODE>4</MAX_GPUS_PER_NODE>
<MAX_MPITASKS_PER_NODE>128</MAX_MPITASKS_PER_NODE>
<MAX_CPUTASKS_PER_GPU_NODE>64</MAX_CPUTASKS_PER_GPU_NODE>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think GPU_TYPE, GPU_OFFLOAD and MPI_GPU_WRAPPER_SCRIPT have been removed from my last PR. Do you need to merge the latest main branch first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hasn't been removed - it wasn't in this PR and will be included in the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants