-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix send buffer misalignment [No 2] #1332
Fix send buffer misalignment [No 2] #1332
Conversation
This is the second attempt to mitigate send buffer misalignment. Previous one (private-attribution#1307) didn't handle all the edge cases and was abandoned in favour of this PR. What I believe makes this change work is the new requirement for active work to be a power of two. With this constraint, it is much easier to align the read size with it. Given that `total_capacity = active * record_size`, we can represent `read_size` as a multiple of `record_size` too: `read_size = X * record_size`. If X is a power of two and active_work is a power of two, then they will always be aligned with each other. For example, if active work is 16, read size is 10 bytes and record size is 3 bytes, then: ``` total_capacity = 16*3 read_size = X*3 (close to 10) X = 2 (power of two that satisfies the requirement) ``` when picking up the read size, we are rounding down to avoid buffer overflows. In the example above, setting X=3 would make it closer to the desired read size, but it is greater than 10, so we pick 2 instead.
While working on changing the gateway and parameters, I ran into several issues where the power of two constraint was not enforced and breakages were hard to find. A better model for me is to gate the active work at the config level, prohibiting invalid constructions at the caller side.
I want to run some Draft tests on this |
The latter failure is expected as we don't have a separate runtime for queries. This will be addressed later |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1332 +/- ##
==========================================
+ Coverage 93.55% 93.59% +0.03%
==========================================
Files 208 209 +1
Lines 33937 34265 +328
==========================================
+ Hits 31750 32069 +319
- Misses 2187 2196 +9 ☔ View full report in Codecov by Sentry. |
ipa-core/src/helpers/gateway/mod.rs
Outdated
@@ -84,6 +85,10 @@ pub struct GatewayConfig { | |||
/// payload may not be exactly this, but it will be the closest multiple of record size to this | |||
/// number. For instance, having 14 bytes records and batch size of 4096 will result in | |||
/// 4088 bytes being sent in a batch. | |||
/// | |||
/// The actual size for read chunks may be bigger or smaller, depending on the record size | |||
/// sent through each channel. Read size will be aligned with [`Self::active_work`] value to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining what to do if we want to align read size perfectly with the target read size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest rewriting this entire comment. It feels out of date.
- line 85 has a typo "side" instead of "size".
- line 86 refers to "batch size" but this comment is above "read_size".
- I think the example is wrong, because 4088 bytes is not a power of two multiple of 14. If I'm not wrong, the best you can do is now 14 * 2^8 < 4096 (it's 3,584).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right. Just did that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments.
ipa-core/src/helpers/gateway/mod.rs
Outdated
@@ -84,6 +85,10 @@ pub struct GatewayConfig { | |||
/// payload may not be exactly this, but it will be the closest multiple of record size to this | |||
/// number. For instance, having 14 bytes records and batch size of 4096 will result in | |||
/// 4088 bytes being sent in a batch. | |||
/// | |||
/// The actual size for read chunks may be bigger or smaller, depending on the record size | |||
/// sent through each channel. Read size will be aligned with [`Self::active_work`] value to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest rewriting this entire comment. It feels out of date.
- line 85 has a typo "side" instead of "size".
- line 86 refers to "batch size" but this comment is above "read_size".
- I think the example is wrong, because 4088 bytes is not a power of two multiple of 14. If I'm not wrong, the best you can do is now 14 * 2^8 < 4096 (it's 3,584).
ipa-core/src/helpers/gateway/send.rs
Outdated
// That is not what we want here: if read_size / record_size is a power | ||
// of two, then subsequent division will get us further away from desired target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so confused by this comment. Didn't you just say you do want the multiplier to be a power of two? But here you're saying it shouldn't be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea that was my poor attempt to explain the algorithm, but I think bitshift approach makes it very straightforward to understand what is going on , so I removed this comment entirely
ipa-core/src/helpers/gateway/send.rs
Outdated
// That is not what we want here: if read_size / record_size is a power | ||
// of two, then subsequent division will get us further away from desired target. | ||
// For example: if read_size / record_size = 4, then prev_power_of_two = 2. | ||
// In such cases, we want to stay where we are, so we add +1 for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "stay where we are"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is all gone now as logic to compute the previous power of two is simplified
ipa-core/src/helpers/gateway/send.rs
Outdated
let prev_power_of_two = | ||
(gateway_config.read_size.get() / record_size + 1).next_power_of_two() / 2; | ||
std::cmp::max(1, prev_power_of_two) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a really complicated way to get the previous power of two. Isn't it easier to just zero-out all but the most significant bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea that would be easier. Thanks for the feedback!
When using more than one thread, this test was failing if futures were scheduled out of order, because `Notify` couldn't wake up futures scheduled after `notify_all` call. Using barriers solves the issue
5dd1de2
to
279c306
Compare
Using bitshift turns out to be much easier to understand
There is no reason to do that.
There is no reason to do that.
Self { | ||
// define read size as a multiplier of record size. The multiplier must be | ||
// a power of two to align perfectly with total capacity. We don't want to exceed | ||
// the target read size, so we pick a power of two <= read_size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something still doesn't make sense here. The power of two is not the same thing as read_size. We might pick 2^8 or something, but 8 isn't comparable with "read_size".
let read_size_multiplier = | ||
// this computes the highest power of 2 less than or equal to read_size / record_size. | ||
// Note, that if record_size is greater than read_size, we round it to 1 | ||
1 << (std::cmp::max(1, usize::BITS - (gateway_config.read_size.get() / record_size).leading_zeros()) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this statement is correct, but it's really hard to read. Can you split it up across a few lines for clarity?
let target_multiplier = gateway_config.read_size.get() / record_size;
let num_bits = usize::BITS - target_multiplier.leading_zeros();
1 << (std::cmp::max(1, num_bits) - 1);
assert!(this.total_capacity.get() >= record_size * gateway_config.active.get()); | ||
assert_eq!(0, this.total_capacity.get() % this.read_size.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little comments here (with link to issues) would be helpful for readers to understand the necessity of these conditions.
Follow up on feedback from #1332
Overview
This is the second attempt to mitigate send buffer misalignment. Previous one (#1307) didn't handle all the edge cases and was abandoned in favour of this PR.
What I believe makes this change work is the new requirement for active work to be a power of two. With this constraint, it is much easier to align the read size with it. Given that
total_capacity = active * record_size
, we can representread_size
as a multiple ofrecord_size
too:read_size = X * record_size
. If X is a power of two and active_work is a power of two, then they will always be aligned with each other.For example, if active work is 16, read size is 10 bytes and record size is 3 bytes, then:
when picking up the read size, we are rounding down to avoid buffer overflows. In the example above, setting X=3 would make it closer to the desired read size, but it is greater than 10, so we pick 2 instead.
What is included in this change?
GatewayConfig