Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix send buffer misalignment [No 2] #1332

Merged

Conversation

akoshelev
Copy link
Collaborator

Overview

This is the second attempt to mitigate send buffer misalignment. Previous one (#1307) didn't handle all the edge cases and was abandoned in favour of this PR.

What I believe makes this change work is the new requirement for active work to be a power of two. With this constraint, it is much easier to align the read size with it. Given that total_capacity = active * record_size, we can represent read_size as a multiple of record_size too:
read_size = X * record_size. If X is a power of two and active_work is a power of two, then they will always be aligned with each other.

For example, if active work is 16, read size is 10 bytes and record size is 3 bytes, then:

total_capacity = 16*3
read_size = X*3 (close to 10)
X = 2 (power of two that satisfies the requirement)

when picking up the read size, we are rounding down to avoid buffer overflows. In the example above, setting X=3 would make it closer to the desired read size, but it is greater than 10, so we pick 2 instead.

What is included in this change?

  • Send buffer now enforces the active work to be a power of two
  • Subsequently, Gateway interface now puts the same restriction on GatewayConfig
  • Fix for unit tests that use ZKP - batch sizes now need to be a power of two as well (after Make active_work match records_per_batch #1316 active work and batch size must be aligned now)

This is the second attempt to mitigate send buffer misalignment. Previous one (private-attribution#1307) didn't handle all the edge cases and was abandoned in favour of this PR.

What I believe makes this change work is the new requirement for active work to be a power of two. With this constraint, it is much easier to align the read size with it. Given that `total_capacity = active * record_size`, we can represent `read_size` as a multiple of `record_size` too:
`read_size = X * record_size`. If X is a power of two and active_work is a power of two, then they will always be aligned with each other.

For example, if active work is 16, read size is 10 bytes and record size is 3 bytes, then:

```
total_capacity = 16*3
read_size = X*3 (close to 10)
X = 2 (power of two that satisfies the requirement)
```

when picking up the read size, we are rounding down to avoid buffer overflows. In the example above, setting X=3 would make it closer to the desired read size, but it is greater than 10, so we pick 2 instead.
While working on changing the gateway and parameters, I ran into
several issues where the power of two constraint was not enforced
and breakages were hard to find.

A better model for me is to gate the active work at the config level,
prohibiting invalid constructions at the caller side.
@akoshelev
Copy link
Collaborator Author

I want to run some Draft tests on this

@akoshelev akoshelev changed the title Fix send buffer misalignment Fix send buffer misalignment [No 2] Oct 2, 2024
@akoshelev
Copy link
Collaborator Author

akoshelev commented Oct 2, 2024

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 97.90941% with 6 lines in your changes missing coverage. Please review.

Project coverage is 93.59%. Comparing base (27de807) to head (34838fe).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
ipa-core/src/utils/power_of_two.rs 94.23% 3 Missing ⚠️
ipa-core/src/helpers/gateway/mod.rs 98.63% 2 Missing ⚠️
ipa-core/src/app.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1332      +/-   ##
==========================================
+ Coverage   93.55%   93.59%   +0.03%     
==========================================
  Files         208      209       +1     
  Lines       33937    34265     +328     
==========================================
+ Hits        31750    32069     +319     
- Misses       2187     2196       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -84,6 +85,10 @@ pub struct GatewayConfig {
/// payload may not be exactly this, but it will be the closest multiple of record size to this
/// number. For instance, having 14 bytes records and batch size of 4096 will result in
/// 4088 bytes being sent in a batch.
///
/// The actual size for read chunks may be bigger or smaller, depending on the record size
/// sent through each channel. Read size will be aligned with [`Self::active_work`] value to
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining what to do if we want to align read size perfectly with the target read size

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rewriting this entire comment. It feels out of date.

  • line 85 has a typo "side" instead of "size".
  • line 86 refers to "batch size" but this comment is above "read_size".
  • I think the example is wrong, because 4088 bytes is not a power of two multiple of 14. If I'm not wrong, the best you can do is now 14 * 2^8 < 4096 (it's 3,584).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. Just did that

Copy link
Collaborator

@benjaminsavage benjaminsavage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

@@ -84,6 +85,10 @@ pub struct GatewayConfig {
/// payload may not be exactly this, but it will be the closest multiple of record size to this
/// number. For instance, having 14 bytes records and batch size of 4096 will result in
/// 4088 bytes being sent in a batch.
///
/// The actual size for read chunks may be bigger or smaller, depending on the record size
/// sent through each channel. Read size will be aligned with [`Self::active_work`] value to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rewriting this entire comment. It feels out of date.

  • line 85 has a typo "side" instead of "size".
  • line 86 refers to "batch size" but this comment is above "read_size".
  • I think the example is wrong, because 4088 bytes is not a power of two multiple of 14. If I'm not wrong, the best you can do is now 14 * 2^8 < 4096 (it's 3,584).

Comment on lines 265 to 266
// That is not what we want here: if read_size / record_size is a power
// of two, then subsequent division will get us further away from desired target.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so confused by this comment. Didn't you just say you do want the multiplier to be a power of two? But here you're saying it shouldn't be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that was my poor attempt to explain the algorithm, but I think bitshift approach makes it very straightforward to understand what is going on , so I removed this comment entirely

// That is not what we want here: if read_size / record_size is a power
// of two, then subsequent division will get us further away from desired target.
// For example: if read_size / record_size = 4, then prev_power_of_two = 2.
// In such cases, we want to stay where we are, so we add +1 for that.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "stay where we are"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is all gone now as logic to compute the previous power of two is simplified

Comment on lines 269 to 272
let prev_power_of_two =
(gateway_config.read_size.get() / record_size + 1).next_power_of_two() / 2;
std::cmp::max(1, prev_power_of_two)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a really complicated way to get the previous power of two. Isn't it easier to just zero-out all but the most significant bit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that would be easier. Thanks for the feedback!

When using more than one thread, this test was failing if futures were scheduled out of order, because `Notify` couldn't wake up futures scheduled after `notify_all` call.

Using barriers solves the issue
Self {
// define read size as a multiplier of record size. The multiplier must be
// a power of two to align perfectly with total capacity. We don't want to exceed
// the target read size, so we pick a power of two <= read_size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something still doesn't make sense here. The power of two is not the same thing as read_size. We might pick 2^8 or something, but 8 isn't comparable with "read_size".

let read_size_multiplier =
// this computes the highest power of 2 less than or equal to read_size / record_size.
// Note, that if record_size is greater than read_size, we round it to 1
1 << (std::cmp::max(1, usize::BITS - (gateway_config.read_size.get() / record_size).leading_zeros()) - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this statement is correct, but it's really hard to read. Can you split it up across a few lines for clarity?

let target_multiplier = gateway_config.read_size.get() / record_size;
let num_bits = usize::BITS - target_multiplier.leading_zeros();
1 << (std::cmp::max(1, num_bits) - 1);

Comment on lines +282 to +283
assert!(this.total_capacity.get() >= record_size * gateway_config.active.get());
assert_eq!(0, this.total_capacity.get() % this.read_size.get());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little comments here (with link to issues) would be helpful for readers to understand the necessity of these conditions.

@benjaminsavage benjaminsavage merged commit 57951b0 into private-attribution:main Oct 7, 2024
12 checks passed
akoshelev added a commit to akoshelev/raw-ipa that referenced this pull request Oct 7, 2024
benjaminsavage added a commit that referenced this pull request Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants