-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job memory usage calculation -- (PSS calculation issue for new kernel) #11687
Comments
Thanks for creating this issue @z4027163
The long term solution would require change of the mechanism on how we distribute runtime code, so even though this fix which is using the more robust I am closing this issue now and we should follow on the original one. |
There are large failures regarding Run3 Rereco WFs running on some of the sites with newer kernel versions. Details: Failures in Run 3 data reprocessing
The detailed reason is in this ticket as well. In short, Linux v6.0+ includes an additional field in smaps, Pss_Dirty. It is added to the PSS calculation and therefore jobs get killed earlier due to PSS exceeding the threshold.
We would like to seek a solution, such as using RSS as the metric to kill the jobs in terms of memory usage. Or else the sites with new kernels will constantly overestimate the memory usage and end jobs earlier.
The text was updated successfully, but these errors were encountered: