-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fdbserver] Gray failure and simulator improvements related to remote processes #11717
base: main
Are you sure you want to change the base?
Conversation
…s classes in each DC
…latency awareness
// remote log routers are experiencing degradation | ||
// (latency) with their peers. Gray failure may trigger | ||
// recovery based on this. | ||
bool CC_ENABLE_REMOTE_LOG_ROUTER_MONITORING; // When enabled, gray failure tries to detect whether |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for not saying DISCONNECT in the name is to keep backwards compatibility with the previous knob.
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
fix compiler warning
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
testState = localState; | ||
} | ||
} | ||
when(wait(clog)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
state Future<Void> clog = self->clogRemoteTLog(self, db); | ||
state TestState testState = TestState::TEST_INIT; | ||
self->actualStatePath.push_back(testState); | ||
loop choose { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove choose when
#include "fdbrpc/Locality.h" | ||
#include "flow/NetworkAddress.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move below
// Currently, only CC_ONLY_CONSIDER_INTRA_DC_LATENCY feature relies on addr_locality mapping. In the future, if | ||
// needed, we can populate the mapping unconditionally. | ||
if (SERVER_KNOBS->CC_ONLY_CONSIDER_INTRA_DC_LATENCY) { | ||
const bool addrDcChanged = self->addr_locality.contains(w.address()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, understand why we can not relax this condition.
incorporate Jingyu's feedback
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds a group of gray failure features related to remote processes. In order to achieve these features, the simulator is made more customizable, in addition to adding more in-memory state in CC. The reviewer can review commit by commit if it's more convenient, here is a high level overview of these commits:
addr_locality
mapping is introduced. It's populated (insert, erase) similar toid_worker
mapping.100K Joshua:
20241018-002455-praza-40b0433b9b2f972dcc759bf2d19aaa43470f7d
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)