Note about IPA wire format #65

akoshelev · 2023-04-12T00:01:03Z

We're getting close to run an in-market test and it is worth debating how the input format for IPA would look like. Given that there is a lot of information included into AAD for each event, it may be possible to get significant savings on the wire (10-30%), given some assumptions about the input ($N_{sites} \ll N_{events}$)

martinthomson · 2023-04-12T00:57:23Z

details/input.md

+# Interoperable Private Attribution wire format
+
+
+This documents provides clarification on the format IPA parties use to submit queries. 


Suggested change

This documents provides clarification on the format IPA parties use to submit queries.

This documents the format that report collectors use to submit IPA queries to helper party networks.

martinthomson · 2023-04-12T01:00:08Z

details/input.md

+
+The biggest savings from the custom format come from making each query to carry only one copy of unique site domain
+and match key provider origin strings. This proposal suggests building two lookup tables (one for each entity) on the 
+caller site


Given that source queries have a single source site and trigger queries have a single trigger site, you could start by indicating the type of query (which can be implicit or part of the query creation step), then you can have two tables: one for the "same" side (source configurations for source queries, trigger configurations for trigger queries) and one for the "other" side (the converse). Then you could concatenate the two tables, index rows starting from zero and refer to the configurations.

Each row in the table would then effectively be a configuration that lists:

Site: length prefixed ASCII; 1 byte length. Optimization hack: length = 0 copies the previous value.

Epoch: 2 bytes.

Key identifier: 1 byte.

There are three implied values that fill out the common stuff:

(implied) Event type is inferred from the table type, so this is effectively run-length encoded.

(implied) The match key provider should be the same for all events, so that can be part of query configuration.

(implied) The helper party should know its own name, so that can be omitted completely.

Indexing into this table shouldn't take too many bytes. But I don't think that a 1 byte is going to work out in all cases. But the table size is known before you start processing individual items, so we can make the index size based on the table size ($\lceil log_2(t)/8 \rceil$).

martinthomson · 2023-04-12T01:19:48Z

details/input.md

+The list of supported parameters include:
+
+| Header name      | Type                         | Description                            | Accepted values | Default? | Mandatory? |
+|------------------|------------------------------|----------------------------------------|-----------------|----------|------------| 
+| `x-ipa-field`    | US-ASCII encoded string      | Field type used to secret-share values | `fp32`          | No       | Yes        |
+| `x-ipa-query`    | US-ASCII encoded string      | Desired query to run in MPC            | `ipa`           | `ipa`    | No         |
+| `x-ipa-version`  | single byte unsigned integer | Version of the request                 | `1`             | No       | Yes        |


RFC 6648.

We need a format for creating a query, which needs to include these values somehow. I would not use header fields for this, but instead define a payload format. This doesn't need to be tightly packed, so JSON is probably where I would go.

Also, some of this is information that could be part of the resource identity. That is, you would have one URI that does IPA and another that does something different. That means that you don't need to include explicit versioning.

Parameters are only necessary if you think that something needs tuning, or there are things that need to be known in order to accept the query. I think that we should directly signal the query size in this request as that has a direct bearing on what is being requested.

IPA already has a bunch of parameters that we have built into our implementation:

The number of breakdown keys.

The maximum value of individual trigger values.

The per-user cap.

The attribution window.

These are what I would expect to see in the request that creates a query.

martinthomson · 2023-04-12T01:21:27Z

details/input.md

+by the implementations. 
+
+The following simulation assumes each event to take **112 bytes** on the wire, including encryption overhead 
+(see [encryption](.encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending


Suggested change

(see [encryption](.encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending

(see [encryption](./encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending

martinthomson · 2023-04-12T01:27:40Z

details/input.md

+Note: The following estimations ignore the TCP/IP/Ethernet frame overhead as it remains the same regardless of the format chosen
+by the implementations. 
+
+The following simulation assumes each event to take **112 bytes** on the wire, including encryption overhead 


I think that you need to spell out assumptions here. Something like:

enc = 32

ciphertext = 2*40/8 = 10

tag = 16

site = 1 + 50 (say)

key id = 1

epoch = 2

breakdown key = 1 (assuming XOR shares here and a small space, not sure about state of the art)

trigger value = 4

ts = 4 (not sure here again)

That's a little more than you have.

But with a table, and if we make breakdown key and trigger value mutually exclusive (and the same size), then we have 68 bytes, plus the table size, which is trivial for a large data set.

The other thing with tables is that you can reuse them...

richajaindce · 2023-04-13T12:47:31Z

details/input.md

+
+## Assumptions
+
+* Report collector use HTTP over TLS to send queries to helper party networks.


Why are we choosing http instead of tls here?

"over" here means that the one protocol layer (HTTP) runs on top of the other (TLS). It doesn't mean "instead of", even though that is another meaning that "over" can take, it isn't the usual assumption in this context.

oh ok.. maybe we should reword it to say "on top of" TLS to avoid confusion?

heh, I never read it that way - too used to this expression. I agree that it does sound confusing, so I'll just use HTTPS instead.

"HTTP over TLS" is the name of the protocol. Or "HTTPS". To call it something else would be far worse.

Didnt know about that. How about we add a link to RFC for it : https://www.rfc-editor.org/rfc/rfc2818

Add note explaining IPA input wire format

7d34dd7

martinthomson reviewed Apr 12, 2023

View reviewed changes

richajaindce reviewed Apr 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Note about IPA wire format #65

Note about IPA wire format #65

akoshelev commented Apr 12, 2023

martinthomson Apr 12, 2023

martinthomson Apr 12, 2023

martinthomson Apr 12, 2023

martinthomson Apr 12, 2023

martinthomson Apr 12, 2023

martinthomson Apr 12, 2023

richajaindce Apr 13, 2023

martinthomson Apr 14, 2023

richajaindce Apr 14, 2023

akoshelev Apr 14, 2023

martinthomson Apr 14, 2023

richajaindce Apr 14, 2023

		# Interoperable Private Attribution wire format


		This documents provides clarification on the format IPA parties use to submit queries.

	This documents provides clarification on the format IPA parties use to submit queries.
	This documents the format that report collectors use to submit IPA queries to helper party networks.

	(see [encryption](.encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending
	(see [encryption](./encryption.md)) and site origin to be a random 25-160 byte ASCII string. The overhead of sending


		## Assumptions

		* Report collector use HTTP over TLS to send queries to helper party networks.

Note about IPA wire format #65

Are you sure you want to change the base?

Note about IPA wire format #65

Conversation

akoshelev commented Apr 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment