-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release-7.3] Check buildteam consistently failed #11685
base: release-7.3
Are you sure you want to change the base?
[Release-7.3] Check buildteam consistently failed #11685
Conversation
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Has it failed in any simulation test, without the changes in this PR? |
if (k * 2 > n) | ||
k = n - k; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this below the case k == 0
. Almost confused to think returning n -k
here.
for (auto& machine : machines) { | ||
int healthySSCount = 0; | ||
for (auto it : machine->serversOnMachine) { | ||
if (!server_status.get(it->getId()).isUnhealthy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is isHealthy()
that can be used.
if (machineCount > maxCount) { | ||
maxCount = machineCount; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be a single line: maxCount = std::max(machineCount, maxCount)
if (machineCount < minCount) { | ||
minCount = machineCount; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minCount = std::min(minCount, machineCount);
minCount = machineCount; | ||
} | ||
} | ||
return { minCount, maxCount, machinesPerZone.size(), machines.size() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a vector, can you use a struct with named fields for better readability?
if (count > maxCount) { | ||
maxCount = count; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::max()
if (count < minCount) { | ||
minCount = count; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::min()
Simulation tests are frequently failed without deciding whether the layout is good. The reason is that the layout is randomly generated in the simulation. |
|
||
int result = n; | ||
for (int i = 2; i <= k; ++i) { | ||
result = result * (n - i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the overflow issues considered here please? If we want to keep the code simple, at least a meaningful error when overflow happens?
In the PR #11678, we add a validation of an invariant after buildTeam: if latest buildTeam succeeds, then each server must have at least targetTeamNumPerServer serverTeams.
What if the buildTeam consistently failed? The simply idea is that keeping tracking the time span that buildTeam keeps failing. If the time span exceeds a threshold, then trace error.
What makes the question interesting is that: the machine, server, and zone layout may not satisfy the buildTeam. As a result, the buildTeam can be consistently failed. For example, if the machine count per zone is small, buildMachineTeam is likely to be failed.
In this PR, we add a logic to decide if the current machine, server, and zone is satisfiable to the targetTeamNumPerServer. We conduct the detection of buildTeam consistent failing only when the layout is satisfiable.
Conclusion
The PR passed 100K tests in release-7.3, indicating if the layout is good, each server eventually has target amount of ServerTeams.
100K correctness:
20240927-053135-zhewang-1921232a72b123a4 compressed=True data_size=35127263 duration=6485446 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:01:42 sanity=False started=100000 stopped=20240927-063317 submitted=20240927-053135 timeout=5400 username=zhewang
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)