Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: MRC Hangs if no NUMA nodes are configured #504

Closed
2 tasks done
drobison00 opened this issue Oct 8, 2024 · 0 comments · Fixed by #505
Closed
2 tasks done

[BUG]: MRC Hangs if no NUMA nodes are configured #504

drobison00 opened this issue Oct 8, 2024 · 0 comments · Fixed by #505
Labels
bug Something isn't working

Comments

@drobison00
Copy link
Contributor

Version

v24.06.01-runtime

Which installation method(s) does this occur on?

Docker

Describe the bug.

We've encountered the following error on some systems. The problem appears to be related to NUMA Affinity being unset, which manifests as MRC raising the error below and hanging during initialization.

F20241008 18:43:21.473876 19 partitions.cpp:83] Check failed: node_set.weight() == 1 (2 vs. 1)

MRC should be updated to fail gracefully or have a default fallback for node_set assignments.

Minimum reproducible example

This appears to be reproducible on systems where nvidia-smi's topo print shows NUMA Affinity as (N/A)

Relevant log output

No response

Full env printout

No response

Other/Misc.

No response

Code of Conduct

  • I agree to follow MRC's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants