-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mcelog didn't catch the mce memory #91
Comments
Was your kernel built with CONFIG_RAS_CEC=y? |
Same problem here.
CPU Intel E5-2680 v3
All those mce errors in the log appeared only on boot (probably before mcelog started). |
Old kernels didn't pass early errors to mcelog. There is a fix in v5.15 See this commit: 3bff147b187d ("x86/mce: Defer processing of early errors") |
Found the mce error on dmesg. But mcelog didn't catch it and /var/log/mcelog is empty,
[root@test ~]#dmesg -T |grep mce
[Tue Apr 21 16:02:26 2020] mce: Using 22 MCE banks
[Sat May 1 08:56:53 2021] mce: [Hardware Error]: Machine check events logged
[root@test ~]# mcelog --client
[root@test ~]# cat /var/log/mcelog
[root@test ~]#
[root@test ~]# cat /etc/mcelog/mcelog.conf
config file for mcelog
For further options, see the mcelog manpage and documentation
by default, disable extended error logging on newer Intel processors
#syslog = yes
logfile = /var/log/mcelog
no-imc-log = yes
Filter out known broken events by default
filter = yes
don't log memory errors individually
#filter-memory-errors = yes
output in undecoded raw format to be easier machine readable
#raw = yes
[server]
An upstream bug prevents this from being disabled
Only allow root to connect by default
client-user = root
Path to socket client uses to connect
socket-path = /var/run/mcelog-client
[dimm]
Enable DIMM-tracking
dimm-tracking-enabled = yes
Disable DIMM DMI pre-population unless supported on your system
dmi-prepopulate = no
execute these triggers when the rate of corrected or uncorrected
errors per DIMM exceeds the threshold
uc-error-trigger = dimm-error-trigger
uc-error-threshold = 1 / 24h
ce-error-trigger = dimm-error-trigger
ce-error-threshold = 10 / 24h
[socket]
Memory error accounting per socket
socket-tracing-enabled = yes
mem-uc-error-threshold = 100 / 24h
mem-ce-error-trigger = socket-memory-error-trigger
mem-ce-error-threshold = 100 / 24h
mem-ce-error-log = yes
[cache]
Attempt to off-line CPUs causing cache errors
cache-threshold-trigger = cache-error-trigger
cache-threshold-log = yes
[page]
Try to soft-offline a 4K page if it exceeds the threshold
memory-ce-threshold = 10 / 24h
memory-ce-trigger = page-error-trigger
memory-ce-log = yes
memory-ce-action = soft
[trigger]
Maximum number of running triggers
children-max = 2
directory = /etc/mcelog/triggers
[root@test ~]#
The text was updated successfully, but these errors were encountered: