Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/cpuinfo: Increase the file descriptors limit to handle more CPUs #262

Closed
wants to merge 1 commit into from

Conversation

babumoger
Copy link
Contributor

@babumoger babumoger commented Feb 7, 2024

The pqos tool fails with the following errors on systems with 300 or more CPU cores.
$pqos
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing the descriptor limit using system call getrlimit and setrlimit respectively. Increase the limit to 4 times the number of CPUs to take care of open files limit.

Description

By default, the file descriptor limit is set to 1024 for a session. pqos
monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs
out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing
the descriptor limit using system call getrlimit and setrlimit respectively.
Increase the limit to 4 times the number of CPUs to take care of open files
limit.

Affected parts

  • library
  • pqos utility
  • rdtset utility
  • App QoS
  • other: (please specify)

Motivation and Context

How Has This Been Tested?

Tested on AMD system.

Types of changes

  • [x ] Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

The pqos tool fails with the following errors on systems with 300 or more
CPU cores.
$pqos
NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events
Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos
monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs
out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing
the descriptor limit using system call getrlimit and setrlimit respectively.
Increase the limit to 4 times the number of CPUs to take care of open files
limit.

Signed-off-by: Babu Moger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant