Think about integrating distributed lock managers #125

jrha · 2018-05-22T18:54:51Z

We would like to ensure that hosts in HA clusters cannot run components at the same time, an obvious solution to this would to be to use a distributed lock manager such as Zookeeper or etcd. This is to prevent outages when services get restarted on multiple hosts simutaneously.

One day it would really nice to be able to make use of aquilon's cluster metadata (e.g. down_hosts_threshold) with this functionality.

See locksmith for an example of a system used to control reboots of hosts – this is in fact very much what we would like to use, but we should be agnostic about the lock manager being used.

The text was updated successfully, but these errors were encountered:

jrha · 2018-05-22T19:32:09Z

I guess pre_script could be used for this, but some care might be required to handle timeouts.

stdweird · 2018-05-22T19:37:03Z

ncm-ncd already takes a lock, maybe extend that to a "global" lock somehow.

ned21 · 2018-05-22T20:38:22Z

+1 to implementing this via the pre-hook. Does your HA software provide anyway to take a lock within cluster? That would remove the need for an external arbitrator.

jrha · 2018-05-22T20:43:41Z

Mostly no, and it would be more useful to us as a generic solution (i.e. prevent too many of any arbitrary class of node being interrupted – e.g. backends behind loadbalancers, preventing a degradation in service rather than an outage).
We already have etcd for other purposes, so we're quite happy to extend it's reach.

stdweird · 2018-10-29T14:27:37Z

@jrha wwill wrtie-up a blog post how they used prescript and etcd to help manage haproxy/keealived clusters

jrha added enhancement discuss at workshop labels May 22, 2018

stdweird removed the discuss at workshop label Oct 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Think about integrating distributed lock managers #125

Think about integrating distributed lock managers #125

jrha commented May 22, 2018 •

edited

Loading

jrha commented May 22, 2018

stdweird commented May 22, 2018

ned21 commented May 22, 2018

jrha commented May 22, 2018 •

edited

Loading

stdweird commented Oct 29, 2018

Think about integrating distributed lock managers #125

Think about integrating distributed lock managers #125

Comments

jrha commented May 22, 2018 • edited Loading

jrha commented May 22, 2018

stdweird commented May 22, 2018

ned21 commented May 22, 2018

jrha commented May 22, 2018 • edited Loading

stdweird commented Oct 29, 2018

jrha commented May 22, 2018 •

edited

Loading

jrha commented May 22, 2018 •

edited

Loading