Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncm-ncd: timeout on stuck NCM components #114

Open
msmark opened this issue May 15, 2017 · 0 comments
Open

ncm-ncd: timeout on stuck NCM components #114

msmark opened this issue May 15, 2017 · 0 comments

Comments

@msmark
Copy link

msmark commented May 15, 2017

As discussed on the mailing list, ncm-ncd should have a configurable timeout setting that will cause NCM components that have stuck indefinitely for whatever reason not to prevent ncm-ncd from continuing with its job and completing with an exit status. ncm-ncd should behave in one of three ways:

  1. Current behaviour, i.e. timeout set to zero means never timeout.
  2. Alert if a component times out, but continue to wait.
  3. Alert if a component times out, kill/clean-up that component and continue with the next components. Any components that depend on the killed one cannot be run, of course and should also be reported as errors with a note that the parent component failed.

Without a timeout, if any component hangs indefinitely, so does ncm-ncd and subsequent runs of ncm-ncd cannot take place due to the lock file. This has left one affected system in a state where nothing was updated for a month and nobody noticed.

ttyS4 pushed a commit to ttyS4/ncm-ncd that referenced this issue Dec 15, 2017
- Fixes quattor#114 (failure to match Perl-generated requirement on SL5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant