Don't pause processing when send_local_response fails #423

krinkinmu · 2024-10-16T17:39:58Z

For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary:

A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders
When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy
The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called
onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing.

As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped.

I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better:

We can avoid stopping processing in such cases to prevent stuck connections on Envoy
We can return errors from proxy_send_local_response instead of silently ignoring them.

Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response.

However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin.

With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing).

I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call.

NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically:

C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently)
Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case.

On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit.

A couple of additional notes for reviewers:

If there are not disagreement with the overall approach, but you don't want to change user visible beahvior when sendLocalResponse fails, I can revert to silencing the error, though it would not be my first preference;
I created an utility function for unit tests to stringify a list of arguments, but I'm pretty sure similar functions already exist in libraries like Abseil; if reviewers will be in favor of including Abseil to the proxy-wasm-cpp-host, I can spend some time and work out how to make that happen and not break Envoy in the meantime.

krinkinmu · 2024-10-16T17:40:27Z

+cc @keithmattix

keithmattix · 2024-10-16T17:51:26Z

Seems reasonable to me! The change seems worth the risk IMO; from what I understand, the affected population are plugin authors who are already pausing envoy

krinkinmu · 2024-10-16T18:00:35Z

Seems reasonable to me! The change seems worth the risk IMO; from what I understand, the affected population are plugin authors who are already pausing envoy

Yes, that's correct. Plugins affected by this change are already affected, but in a different way.

PiotrSikora · 2024-10-18T05:53:45Z

Propagating returned status is definitely a good thing to do. But please note that proxy_send_local_response was originally infallible, and the error codepath was added in envoyproxy/envoy#23049 to address the "double send local response" issue (breaking the ABI contract in the process, which is why Rust SDK panics when this happens).

Having said that:

The plugin returning multiple local responses is clearly buggy, so why are we adding a workaround for it if it's not crashing hosts? We're not adding workarounds for a plugins that consistently return Pause and never make any progress, which results in exactly the same "stuck" behavior.
What should be the fallback for failed proxy_send_local_response (which in Envoy is pretty much exclusively used for generating short error responses) that plugins should implement? This adds extra complexity to all plugins in order to address a broken use case.

Also, and that's a topic for a separate issue, but I question whether we should be calling the plugin for response processing of response it generated itself.

PiotrSikora · 2024-10-18T06:02:57Z

What should be the fallback for failed proxy_send_local_response (which in Envoy is pretty much exclusively used for generating short error responses) that plugins should implement? This adds extra complexity to all plugins in order to address a broken use case.

For example, in Rust SDK's example HTTP authorization plugin that relies on this behavior (like every other authorization plugin), what should be behavior when this calls fails?

The only solution that comes to mind is returning Pause and adding checks to make sure that Pause is also returned in all other callbacks, which is basically reimplementing existing logic and "stuck" behavior on the plugin side in much more error-prone way, and at much higher cost.

What am I missing?

For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>

krinkinmu · 2024-10-18T19:13:33Z

@PiotrSikora you're correct that the plugins that do that are buggy and it's definitely not the intent here to create a workarounds for them.

That being said, I think Envoy can do better in a presense of a buggy plugin and not leave a stuck connection behind. And a buggy plugin could benefit from a signal that would tell them that they are doing something wrong.

As for the fallback, my thinking here is that plugins shouldn't need a fallback in this case - they should just stop calling proxy_send_local_response when processing a local response. In a way proxy_send_local_response remains infallible as long as it is used correctly.

Rather than returning an error, I can try and find a way to "crash" the plugin (i.e., Rust SDK would just panic in this case and I can see if I can do the same for C++). I can also skip returning an error all together, though I don't think it's the best way forward, because plugins that do that basically get no clear indication that they are doing something wrong.

I also agree with you that the whole situation when a plugin is processing its own local reply is confusing and seem to be causing subtle issues. So before moving forward with the review of this PR let me see if I can come up with a change on the Envoy side to address that behavior - maybe Envoy folks will be receptive to this change of Envoy behavior.

krinkinmu requested review from PiotrSikora, martijneken and mpwarres as code owners October 16, 2024 17:39

krinkinmu mentioned this pull request Oct 16, 2024

WASM Calling proxy_send_local_response twice will stuck remote http client(e.g. curl) forever until timeout or interrupted envoyproxy/envoy#28826

Open

krinkinmu force-pushed the fix-stuck-envoy branch from 6cfb37d to 173953e Compare October 18, 2024 07:45

krinkinmu force-pushed the fix-stuck-envoy branch from 173953e to 98fe532 Compare October 18, 2024 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't pause processing when send_local_response fails #423

Don't pause processing when send_local_response fails #423

krinkinmu commented Oct 16, 2024

krinkinmu commented Oct 16, 2024

keithmattix commented Oct 16, 2024

krinkinmu commented Oct 16, 2024

PiotrSikora commented Oct 18, 2024

PiotrSikora commented Oct 18, 2024

krinkinmu commented Oct 18, 2024

Don't pause processing when send_local_response fails #423

Are you sure you want to change the base?

Don't pause processing when send_local_response fails #423

Conversation

krinkinmu commented Oct 16, 2024

krinkinmu commented Oct 16, 2024

keithmattix commented Oct 16, 2024

krinkinmu commented Oct 16, 2024

PiotrSikora commented Oct 18, 2024

PiotrSikora commented Oct 18, 2024

krinkinmu commented Oct 18, 2024