You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have many customers who are smart. However, when discussing etcd and quorum they forget to realize that once the quorum membership is set, it doesn't change when a member goes offline/down.
This results in inaccurate discussions regarding 5 members instead of 3. i.e. When talking about HA etcd in 2 AZ's a customer says "well, let's go to 5 members, then I will always have quorum if one goes down."
Somewhere, 2 details fail in this scenario.
The Customer only has 2 AZ's, so the plan needs to account for the failure of an AZ not for individual hosts.
When an AZ goes offline either lose 2 or 3 etcd members are lost.
Planning for worst case the discussion should cover the loss of 3 members; 2/5 members are active which is less than 50% and etcd goes read-only"
Quorum membership does not automatically change when members go offline/down.
In the situation of 2 AZ's and 5 etcd members, when an AZ is lost either 2 or 3 etcd members go down. Discussing the loss of 3 members, the conversation I hear from customers is: "I still have 2 members, so 2/3 is more than 50%". Somehow the customer forgets that the quorum is size 5 and does not change to 3 unless manually told to do so and in their example they have 2/5 members active which is less than 50% and etcd goes read-only"
My ask is this. Either explicitly document the behavior of a 5 member quorum, or add extra text clarifying the quorum size not changing automatically during an outage.
The text was updated successfully, but these errors were encountered:
I have many customers who are smart. However, when discussing etcd and quorum they forget to realize that once the quorum membership is set, it doesn't change when a member goes offline/down.
This results in inaccurate discussions regarding 5 members instead of 3. i.e. When talking about HA etcd in 2 AZ's a customer says "well, let's go to 5 members, then I will always have quorum if one goes down."
Somewhere, 2 details fail in this scenario.
When an AZ goes offline either lose 2 or 3 etcd members are lost.
Planning for worst case the discussion should cover the loss of 3 members; 2/5 members are active which is less than 50% and etcd goes read-only"
In the situation of 2 AZ's and 5 etcd members, when an AZ is lost either 2 or 3 etcd members go down. Discussing the loss of 3 members, the conversation I hear from customers is: "I still have 2 members, so 2/3 is more than 50%". Somehow the customer forgets that the quorum is size 5 and does not change to 3 unless manually told to do so and in their example they have 2/5 members active which is less than 50% and etcd goes read-only"
My ask is this. Either explicitly document the behavior of a 5 member quorum, or add extra text clarifying the quorum size not changing automatically during an outage.
The text was updated successfully, but these errors were encountered: