Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note about IPA wire format #65

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akoshelev
Copy link

We're getting close to run an in-market test and it is worth debating how the input format for IPA would look like. Given that there is a lot of information included into AAD for each event, it may be possible to get significant savings on the wire (10-30%), given some assumptions about the input ($N_{sites} \ll N_{events}$)

# Interoperable Private Attribution wire format


This documents provides clarification on the format IPA parties use to submit queries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This documents provides clarification on the format IPA parties use to submit queries.
This documents the format that report collectors use to submit IPA queries to helper party networks.


The biggest savings from the custom format come from making each query to carry only one copy of unique site domain
and match key provider origin strings. This proposal suggests building two lookup tables (one for each entity) on the
caller site
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that source queries have a single source site and trigger queries have a single trigger site, you could start by indicating the type of query (which can be implicit or part of the query creation step), then you can have two tables: one for the "same" side (source configurations for source queries, trigger configurations for trigger queries) and one for the "other" side (the converse). Then you could concatenate the two tables, index rows starting from zero and refer to the configurations.

Each row in the table would then effectively be a configuration that lists:

  1. Site: length prefixed ASCII; 1 byte length. Optimization hack: length = 0 copies the previous value.
  2. Epoch: 2 bytes.
  3. Key identifier: 1 byte.

There are three implied values that fill out the common stuff:

  1. (implied) Event type is inferred from the table type, so this is effectively run-length encoded.
  2. (implied) The match key provider should be the same for all events, so that can be part of query configuration.
  3. (implied) The helper party should know its own name, so that can be omitted completely.

Indexing into this table shouldn't take too many bytes. But I don't think that a 1 byte is going to work out in all cases. But the table size is known before you start processing individual items, so we can make the index size based on the table size ($\lceil log_2(t)/8 \rceil$).

Comment on lines +77 to +83
The list of supported parameters include:

| Header name | Type | Description | Accepted values | Default? | Mandatory? |
|------------------|------------------------------|----------------------------------------|-----------------|----------|------------|
| `x-ipa-field` | US-ASCII encoded string | Field type used to secret-share values | `fp32` | No | Yes |
| `x-ipa-query` | US-ASCII encoded string | Desired query to run in MPC | `ipa` | `ipa` | No |
| `x-ipa-version` | single byte unsigned integer | Version of the request | `1` | No | Yes |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 6648.

We need a format for creating a query, which needs to include these values somehow. I would not use header fields for this, but instead define a payload format. This doesn't need to be tightly packed, so JSON is probably where I would go.

Also, some of this is information that could be part of the resource identity. That is, you would have one URI that does IPA and another that does something different. That means that you don't need to include explicit versioning.

Parameters are only necessary if you think that something needs tuning, or there are things that need to be known in order to accept the query. I think that we should directly signal the query size in this request as that has a direct bearing on what is being requested.

IPA already has a bunch of parameters that we have built into our implementation:

  • The number of breakdown keys.
  • The maximum value of individual trigger values.
  • The per-user cap.
  • The attribution window.

These are what I would expect to see in the request that creates a query.

by the implementations.

The following simulation assumes each event to take **112 bytes** on the wire, including encryption overhead
(see [encryption](.encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(see [encryption](.encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending
(see [encryption](./encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending

Note: The following estimations ignore the TCP/IP/Ethernet frame overhead as it remains the same regardless of the format chosen
by the implementations.

The following simulation assumes each event to take **112 bytes** on the wire, including encryption overhead
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you need to spell out assumptions here. Something like:

  • enc = 32
  • ciphertext = 2*40/8 = 10
  • tag = 16
  • site = 1 + 50 (say)
  • key id = 1
  • epoch = 2
  • breakdown key = 1 (assuming XOR shares here and a small space, not sure about state of the art)
  • trigger value = 4
  • ts = 4 (not sure here again)

That's a little more than you have.

But with a table, and if we make breakdown key and trigger value mutually exclusive (and the same size), then we have 68 bytes, plus the table size, which is trivial for a large data set.

The other thing with tables is that you can reuse them...


## Assumptions

* Report collector use HTTP over TLS to send queries to helper party networks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we choosing http instead of tls here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"over" here means that the one protocol layer (HTTP) runs on top of the other (TLS). It doesn't mean "instead of", even though that is another meaning that "over" can take, it isn't the usual assumption in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok.. maybe we should reword it to say "on top of" TLS to avoid confusion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, I never read it that way - too used to this expression. I agree that it does sound confusing, so I'll just use HTTPS instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"HTTP over TLS" is the name of the protocol. Or "HTTPS". To call it something else would be far worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt know about that. How about we add a link to RFC for it : https://www.rfc-editor.org/rfc/rfc2818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants