Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsrep_sst_common check_port bug #655

Open
glbyers opened this issue Mar 13, 2024 · 0 comments
Open

wsrep_sst_common check_port bug #655

glbyers opened this issue Mar 13, 2024 · 0 comments

Comments

@glbyers
Copy link

glbyers commented Mar 13, 2024

A common tuning for mariadb running under systemd on Linux systems is to set LimitNOFILE to something larger than the default. In our case, we set it to infinity, which has a different meaning depending on the version of systemd;

  • 64k prior to systemd 234
  • the value of fs.nr_open from 234, which can be extraordinarily large - 1073741816 on rhel9

In the check_port function, we have the following;

    if [ $lsof_available -ne 0 ]; then
        lsof -Pnl -i ":$port" 2>/dev/null | \
        grep -q -E "^($utils)[^[:space:]]*[[:space:]]+$pid[[:space:]].*\\(LISTEN\\)" && rc=0   

The problem is that lsof closes all file handles except stdin, stdout & stderr. When the nofile limit is high, this can take longer than some hard-coded timeouts. ie, in the wsrep_sst_mariabackup script, we have this in recv_joiner;

    local ltcmd="$tcmd"
    if [ $tmt -gt 0 ]; then
        if [ -n "$(commandex timeout)" ]; then
            if timeout --help | grep -qw -F -- '-k'; then
                ltcmd="timeout -k $(( tmt+10 )) $tmt $tcmd"
            else
                ltcmd="timeout -s9 $tmt $tcmd"
            fi
        fi
    fi

    if [ $wait -ne 0 ]; then
        wait_for_listen &
    fi

And in wait_for_listen;

wait_for_listen()
{
    for i in {1..150}; do
        if check_port "" "$SST_PORT" 'socat|nc'; then
            break
        fi
        sleep 0.2
    done
    echo "ready $ADDR:$SST_PORT/$MODULE/$lsn/$sst_ver"
}

So the check_port call needs to complete before the timeout configured in recv_joiner in order to signal to the donor that we're ready to receive the backup. This never occurs, because lsof is still busy closing file handles when the timeout expires. On rhel8 with LimitNOFILE=infiinity set in the systemd unit file for mariadb, everything is peachy as it's really 64k. But the same config migrated to rhel9 will result in being unable to bootstrap a cluster & there's very little in the way of logging to indicate why.

Would it be reasonable to set some sane limits within the code that calls the scripts associated with wsrep_sst_method, or perhaps to call ulimit -n 4096 or similar within the wsrep_sst_* scripts? It really is a nasty gotcha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant