-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RequestMetadata: Add an additional_handlers
field for additional tool details
#306
base: main
Are you sure you want to change the base?
RequestMetadata: Add an additional_handlers
field for additional tool details
#306
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though Protobuf does support recursion, I think we should avoid it if we can, especially in this case where the recursion is linear.
Please make the existing tool details field repeated.
I had a memory of some discussion about potentially having a tree structure of tool details in the June monthly meeting, but just using a simple Is a |
For Protobuf’s binary format it is 100% safe to upgrade a singular field to a repeated one. In fact, a repeated field that only has a single element is encoded exactly the same way as a singular field. When an old client unmarshals a new message that contains multiple elements, the last one is retained. See: https://protobuf.dev/programming-guides/encoding/#last-one-wins The only annoyance is Protobuf’s JSON format. At least the Go protojson package is too stubborn to permit parsing scalars as lists or vice versa. But especially for RequestMetadata this should never be an issue. The protocol only documents circumstances in which this message is encoded in binary form, base64 encoded, and then attached to a HTTP header. So especially for that message I’d be far less concerned. Even if gRPC-Web is used, this message in particular is never JSON encoded. If we want to be super conservative, we could add a separate repeated field and at the same time deprecate the old one. This is also what we did for the multiple digest functions for remote execution in the capabilities response. But again, I don’t think that is worth the hassle. |
c11dd37
to
fbf86d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please run the git hook script to regenerate the .pb.go files.
fbf86d2
to
2de9d44
Compare
I think it looks good to go. But let’s give it a final round of discussion during the monthly gathering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a breaking change.
Instead of changing the existing field, could you please add a newer field and mark the old one as deprecated?
To clarify, any client/server implementations would need to be backward compatible because their counterpart could be using multiple different versions of remote apis. For example: our RBE server supports a wide range of Bazel versions: from 5.0 to the latest 8.x on HEAD. So we cannot have tool_details being a singular struct in some requests and an array in others.
Please see the discussion we had above, which specifically discusses the compatibility aspects of Protobuf when upgrading singular fields to repeated fields. |
Right. But the generated Go code would still flip around between a struct and a slice and thus, is a breaking change for the Go module. |
Which is completely acceptable, because that's what |
As I read that link, the "last one wins" behavior is only true for primitive types--strings and numbers. For nested messages, as is the case here, the subfields will actually be merged. Merging will almost certainly break whatever metrics/dashboards people have built around this data. |
Should we specify any concept of ordering in this repeated field? For example, if we're talking about Bazel plus one proxy, I'd probably want Bazel's information to be in position 0 as the "originator" of the request. I'm guessing that the easiest implementation will just have successive layers of handlers append to the end, but it may be useful to codify that in the API itself. Combined with the issues of merging fields for legacy clients, I'm wondering if we should keep the existing scalar field as-is, and add a "repeated ToolDetails additional_handlers" that can be populated with anything that proxied/modified the request along the way. |
It sounds like this might be the most frictionless way forward with this change, I'll update the MR. |
0b190b7
to
c47c044
Compare
additional_handlers
field for additional tool details
@@ -2141,4 +2141,7 @@ message RequestMetadata { | |||
// There is no expectation that this value will have any particular structure, | |||
// or equality across invocations, though some client tools may offer these guarantees. | |||
string configuration_id = 7; | |||
|
|||
// The details of other tools involved in handling the request, eg. client-side proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify whether or not this field should duplicate the information in the tool_details field above, and whether there's any ordering requirement (e.g., the order of additional_handlers implies what order the handlers actually executed in).
My personal opinion is that additional_handlers should NOT duplicate tool_details, and that there is no ordering requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your opinion on this, updated the comment to make this clear.
@@ -2141,4 +2141,7 @@ message RequestMetadata { | |||
// There is no expectation that this value will have any particular structure, | |||
// or equality across invocations, though some client tools may offer these guarantees. | |||
string configuration_id = 7; | |||
|
|||
// The details of other tools involved in handling the request, eg. client-side proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify whether tool_details is the handler closest to the user or closest to the server. For instance, if the request originates with Bazel and passes through proxy1 and proxy2 before reaching the server, should tool_details be Bazel or proxy2?
My opinion is that it should be Bazel as the originator of the request.
What's wrong with merging in this specific case? |
Ah, you are right. I had misread that documentation as "the string fields will be concatenated." On a second reading that's clearly not the case. I think there's still some awkwardness with ordering--IIRC there's no guarantee of the serialization order, so servers expecting a singular field could technically end up with either entry 0 or entry 1. It would also be possible to end up with a tool_name from one tool and a tool_version from the other. |
I don't think that's the case. It's literally "the last one wins". It is impossible to serialize Protobuf messages in such a way that list entries are stored in a different order in the byte stream as they were listed in the original array. So a legacy server will have
Yes, but this only happens if the last entry/entries leave one of |
c47c044
to
43408cc
Compare
It's not the end of the world, but it seems like the kind of bug that makes another bug harder to investigate. And this merging behaviour means that when clients using the repeated field are talking to a server using the old proto, the most useful information in the chain (ie. the original invoker) gets lost. Granted I doubt that many use-cases will ever end up in that usage pattern. I like the idea of having a separate field for additional related tools; it keeps this as strictly an additional feature, without impacting existing workflows. It also seems to map well onto the intended usage to me, the |
|
||
// The details of other tools involved in handling the request, eg. client-side proxies. | ||
// This field MUST NOT include the details of the tool which initially invoked the gRPC | ||
// request (ie. it MUST NOT duplicate `RequestMetadata.tool_details`). The order of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "i.e., it MUST NOT" (note punctuation of "i.e.,")
// client -> proxy1 -> proxy2 -> server | ||
// | ||
// `RequestMetadata.tool_details` SHOULD contain a ToolDetails message about `client`, | ||
// and `RequestMetadata.additional_handlers` MAY contain two ToolDetails messages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that both tool_details and additional_handlers probably both SHOULD (not MAY) contain the required information. Technically all of this is optional, in an optional header, but within that scope I don't see a good rationale for making one SHOULD and the other MAY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments and then I'm good with this, pending approval from other reviewers.
It is possible for more than one tool to be involved in the sending of a gRPC request, for example when a client-side proxy is used it is interesting to know the name and version of both the proxy and the original client tool. To facilitate this in RequestMetadata, this commit adds a new repeated `additional_handlers` field to the message which can contain details of other tools involved in the handling of a request.
43408cc
to
e3db82c
Compare
I'd like to get a thumbs-up from Ed and Son before merging, if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of adding some guidance about ordering to this field.
We want to avoid cases where some implementations would implement this as ASC while others implement it as DESC, and the only way to figure out which was correct is to peek into the implementation.
If it's the current state, then I suspect folks will start cloning the last tool details in this field and append to tool_name
with a separator to determine ordering. I.e. something like
additional_handlers = [
{tool_name = "proxy1"},
{tool_name = "proxy1/proxy2"},
]
which can be quite ugly.
It is possible for more than one tool to be involved in the sending of a gRPC request, for example when a client-side proxy is used it is interesting to know the name and version of both the proxy and the original client tool.
To facilitate this in RequestMetadata, this PR adds a repeated
additional_handlers
field containing ToolDetails messages, allowing proxies and other related tools to also be recorded in the metadata sent with requests without modifying the original values.