-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pause processing when send_local_response fails #423
base: main
Are you sure you want to change the base?
Conversation
+cc @keithmattix |
Seems reasonable to me! The change seems worth the risk IMO; from what I understand, the affected population are plugin authors who are already pausing envoy |
Yes, that's correct. Plugins affected by this change are already affected, but in a different way. |
Propagating returned status is definitely a good thing to do. But please note that Having said that:
Also, and that's a topic for a separate issue, but I question whether we should be calling the plugin for response processing of response it generated itself. |
For example, in Rust SDK's example HTTP authorization plugin that relies on this behavior (like every other authorization plugin), what should be behavior when this calls fails? The only solution that comes to mind is returning What am I missing? |
6cfb37d
to
173953e
Compare
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>
173953e
to
98fe532
Compare
@PiotrSikora you're correct that the plugins that do that are buggy and it's definitely not the intent here to create a workarounds for them. That being said, I think Envoy can do better in a presense of a buggy plugin and not leave a stuck connection behind. And a buggy plugin could benefit from a signal that would tell them that they are doing something wrong. As for the fallback, my thinking here is that plugins shouldn't need a fallback in this case - they should just stop calling proxy_send_local_response when processing a local response. In a way proxy_send_local_response remains infallible as long as it is used correctly. Rather than returning an error, I can try and find a way to "crash" the plugin (i.e., Rust SDK would just panic in this case and I can see if I can do the same for C++). I can also skip returning an error all together, though I don't think it's the best way forward, because plugins that do that basically get no clear indication that they are doing something wrong. I also agree with you that the whole situation when a plugin is processing its own local reply is confusing and seem to be causing subtle issues. So before moving forward with the review of this PR let me see if I can come up with a change on the Envoy side to address that behavior - maybe Envoy folks will be receptive to this change of Envoy behavior. |
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary:
As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped.
I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better:
Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response.
However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin.
With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing).
I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call.
NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically:
On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit.
A couple of additional notes for reviewers: