You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our node ran into a problem with the 'too many open files issue'. When this happens there was a consensus failure and during that health check did not fail
Current Behaviour
We relied on health checks and axelard status for our alerting. Even though consensus failed we were not getting alerted
We were jailed during this period.
Expected Behaviour
When there is a consensus failure the health check should not say passed for axelard?
Steps to reproduce (for bugs)
Relevant Logs or Files
2021-12-01T13:30:33Z INF received complete proposal block hash= 883699B4974EA2EFD8674FDBB58D5613197072A32BF364731321F51F66BED25D height= 189903 module= consensus
2021-12-01T13:30:33Z ERR CONSENSUS FAILURE!!! err= "open /root/.axelar_testnet/.core/data/write-file-atomic-07343618385475708275: too many open files" module= consensus stack= "goroutine 893 [running]:\nruntime/debug.Stack(0xc02687b050, 0x2015080, 0xc010132000)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x9f\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2(0xc000d72a80, 0x2521850)\n\t/go/pkg/mod/github.com/tendermint/[email protected]/consensus/state.go:726 +0x5b\npanic(0x2015080, 0xc010132000)\n\t/usr/local/go/src/runtime/panic.go:965 +0x1b9\ngithub.com/tendermint/tendermint/privval
The max no of open files is 1024. Should have increased it as I think it's recommended to have 4k for TM.
I am not sure if it's supposed to trigger consensus failure
The axelar core process did not crash and it continues to show catching up false
Description/Reasoning
Our node ran into a problem with the 'too many open files issue'. When this happens there was a consensus failure and during that health check did not fail
Current Behaviour
We relied on health checks and axelard status for our alerting. Even though consensus failed we were not getting alerted
We were jailed during this period.
Expected Behaviour
When there is a consensus failure the health check should not say passed for axelard?
Steps to reproduce (for bugs)
Relevant Logs or Files
The max no of open files is 1024. Should have increased it as I think it's recommended to have 4k for TM.
I am not sure if it's supposed to trigger consensus failure
The axelar core process did not crash and it continues to show catching up false
health check continues to say passed
The text was updated successfully, but these errors were encountered: