Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark the node as FAIL when the node is marked as NOADDR #1191

Open
wants to merge 3 commits into
base: unstable
Choose a base branch
from

Commits on Oct 18, 2024

  1. Mark the node as FAIL when the node is marked as NOADDR

    Imagine we have a cluster, for example a three-shard cluster,
    if shard 1 doing a CLUSTER RESET HARD, it will change the node
    name, and then other nodes will mark it as NOADR since the node
    name received by PONG has changed.
    
    In the eyes of other nodes, there is one working primary node
    left but with no address, and in this case, the address report
    in MOVED will be invalid and will confuse the clients. And in
    the same time, the replica will not failover since its primary
    is not in the FAIL state. And the cluster looks OK to everyone.
    
    This leaves a cluster that appears OK, but with no coverage for
    shard 1, obviously we should do something like CLUSTER FORGET
    to remove the node and fix the cluster before using it.
    
    But the point in here, we can mark the NOADDR node as FAIL to
    advance the cluster state. If a node is NOADDR means it does
    not have a valid address, so we won't reconnect it, we won't
    send PING, we won't gossip it, it seems reasonable to mark it
    as FAIL.
    
    Signed-off-by: Binbin <[email protected]>
    enjoy-binbin committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    c1bf0e6 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2024

  1. update comment

    Signed-off-by: Binbin <[email protected]>
    enjoy-binbin committed Oct 19, 2024
    Configuration menu
    Copy the full SHA
    bc86365 View commit details
    Browse the repository at this point in the history
  2. fix timing issue

    Signed-off-by: Binbin <[email protected]>
    enjoy-binbin committed Oct 19, 2024
    Configuration menu
    Copy the full SHA
    4d2780a View commit details
    Browse the repository at this point in the history