Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "Expect: 100-continue" header #679

Closed

Conversation

johan-bjareholt
Copy link
Contributor

@johan-bjareholt johan-bjareholt commented Nov 7, 2023

Works well, very easy to use by just setting the expect header like this

let huge_string = "abcde ".repeat(500);
let req = ureq::post("http://127.0.0.1:5000")
    .set("Expect", "100-continue");
 let res = req.send_string(&huge_string).unwrap();
println!("res: {:?}", res);
let body = res.into_string().unwrap();
println!("body: {}", body);

To-do

  • A timeout to send body even if server does not understand "Expect: 100-continue"
    • If the client timed out waiting for the HTTP/1.1 100 Continue, then proceed with sending the body without waiting for an "100 Continue" response.
    • (optional) Make the timeout configurable (1000ms default)
  • Follow the spec and on 417 resend the whole request but without the expect-100 header.
  • (optional) Set "Expect: 100-continue" by default
    • Only set 100-continue if there is a body, if the body has a known size over X MB or always if the size is not known

@johan-bjareholt
Copy link
Contributor Author

This is a follow up to issue #676

src/response.rs Outdated Show resolved Hide resolved
src/unit.rs Outdated Show resolved Hide resolved
@algesten
Copy link
Owner

algesten commented Nov 8, 2023

Thanks for looking into this!

Overall looks like good changes. You can use the test.sh script in the root to roughly run the same tests the CI will do.

Copy link
Owner

@algesten algesten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Was thinking of a bigger refactor might be clearer.

What if the part that just reads the header returns a new type ResponseHead (not public). Then there are two methods to "complete" that type into the public Response, either without_body (for expect-100) and with_body for the normal case?

It would potentially be cleaner than trying to make Response be slightly different depending...

src/response.rs Outdated Show resolved Hide resolved
src/response.rs Outdated Show resolved Hide resolved
src/response.rs Outdated Show resolved Hide resolved
src/response.rs Outdated Show resolved Hide resolved
@jsha
Copy link
Collaborator

jsha commented Nov 8, 2023

Thanks so much for the PR! I'll review it soon.

@jsha
Copy link
Collaborator

jsha commented Nov 8, 2023

Oops, I posted that before I saw @algesten had already provided a review! But I will still plan to find some time and take a look soon.

@johan-bjareholt
Copy link
Contributor Author

I removed the "consume" argument from "read_response_head", the reason why it existed was because I was testing against a broken http server that for some strange reason responded with "100 Continue" twice... It took me embarassly long until I realised this and found that out in wireshark.

I have now tested this code against a libsoup2 server and a python flask http server, works fine there.

@algesten
Copy link
Owner

algesten commented Nov 9, 2023

I think this looks pretty good. Wonder if @jsha agrees?

Copy link
Collaborator

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks good to me! Thanks so much for writing it.

@jsha
Copy link
Collaborator

jsha commented Nov 10, 2023

Looks like you've got a clippy error. Mind fixing that and I'll merge?

@johan-bjareholt
Copy link
Contributor Author

Looks like you've got a clippy error. Mind fixing that and I'll merge?

Done!

@johan-bjareholt
Copy link
Contributor Author

Also, regarding the broken client. I'm not sure if this is a flask bug or something. To be more exact, when using "default" flask it's broken, but when using flask with werkzeug "serving.make_server" it works fine. Might be a bug in flask?

testclient

fn request() {
    let string = "abcde ".repeat(5);

    let req = ureq::post("http://127.0.0.1:5000/")
        .set("Expect", "100-continue");
    let res = req.send(string.as_bytes()).unwrap();
    println!("res2: {:?}", res);
    assert_eq!(res.status(), 200);
    let body = res.into_string().unwrap();
    assert_eq!("Hello, world!", body);
}

fn main() {
    request();
}

#[test]
fn test_a() {
    for i in 1..1000 {
        println!("\nRun {i}\n");
        request();
    }
}

Flask with werkzeug serving.make_server (works)

#!/usr/bin/env python3

from flask import Flask, request
from werkzeug import serving

app = Flask(__name__)

@app.route('/', methods = ['POST'])
def a():
    return "Hello, world!"


if __name__=='__main__':
    app.use_reloader = False
    server = serving.make_server(
        host="0.0.0.0",
        port=5000,
        app=app,
        ssl_context=None)
    server.serve_forever()

Flask with app.run (broken)

#!/usr/bin/env python3

from flask import Flask, request

app = Flask(__name__)

@app.route('/', methods = ['POST'])
def a():
    return "Hello, world!"


if __name__=='__main__':
    app.run()

Wireshark of broken server

POST / HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: ureq/2.8.0
Accept: */*
Expect: 100-continue
accept-encoding: gzip
Transfer-Encoding: chunked

HTTP/1.1 100 Continue

HTTP/1.1 100 Continue

1e
abcde abcde abcde abcde abcde 
0

HTTP/1.1 200 OK
Server: Werkzeug/2.2.2 Python/3.9.2
Date: Fri, 10 Nov 2023 07:39:39 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 13
Connection: close

What happens is that because we get two 100-continue, our ureq client consumes the first 100-continue as it should and then puts the second 100-continue as its body, which the test client then asserts on as it expects a 200 OK.

@algesten
Copy link
Owner

I might have dreamt this. But I seem to recall curl waits for a while before sending the body regardless of a 100 response. This is to interoperate with servers that doesn't understand expect-100

https://www.rfc-editor.org/rfc/rfc7231.html#section-5.1.1

@algesten
Copy link
Owner

Ah yes CURLOPT_EXPECT_100_TIMEOUT_MS defaults to 1000ms

@johan-bjareholt
Copy link
Contributor Author

Sounds reasonable, the only risk I see with this is if the server is just slow with responding with the 100-continue. So it might be a bit racy, seems like it would be important in that case to be able to configure the timeout?

@algesten
Copy link
Owner

I think there are two things we want to consider.

  1. A timeout to send body anyway.
  2. Follow the spec and on 417 resend the request without the expect-100 header.

@algesten
Copy link
Owner

Sounds reasonable, the only risk I see with this is if the server is just slow with responding with the 100-continue. So it might be a bit racy

I assume the racy-ness is normal behavior. The server would need to handle the case where the client starts sending the body regardless of the answer. In that respect 417 is only best effort.

seems like it would be important in that case to be able to configure the timeout?

Agree. We should put it as an option in Agent.

@jsha
Copy link
Collaborator

jsha commented Nov 10, 2023

I think the racy-ness can be resolved like so:

If the client timed out waiting for the HTTP/1.1 100 Continue, then when it tries to read headers again after sending the body, it's possible it may receive an HTTP/1.1 100 Continue at that time, followed by the final headers. If the "reading headers after body" step gets a 100 (vs any other code), it should respond to that by reading headers a second time, and returning those.

Here's the most up-to-date RFC (substantively the same as RFC 7231 on this issue): https://www.rfc-editor.org/rfc/rfc9110#field.expect

A client MUST NOT generate a 100-continue expectation in a request that does not include content.
A client that will wait for a 100 (Continue) response before sending the request content MUST send an Expect header field containing a 100-continue expectation.
A client that sends a 100-continue expectation is not required to wait for any specific length of time; such a client MAY proceed to send the content even if it has not yet received a response. Furthermore, since 100 (Continue) responses cannot be sent through an HTTP/1.0 intermediary, such a client SHOULD NOT wait for an indefinite period before sending the content.
A client that receives a 417 (Expectation Failed) status code in response to a request containing a 100-continue expectation SHOULD repeat that request without a 100-continue expectation, since the 417 response merely indicates that the response chain does not support expectations (e.g., it passes through an HTTP/1.0 server).

I think we're seeing that the "time out if no early response" SHOULD is actually pretty necessary for decent operation, so let's add it. @algesten any thoughts on wanting to include it as part of this PR or as a follow-on?

I think the "retry on 417" SHOULD is good but I think it's not as urgent.

@algesten
Copy link
Owner

Splitting those apart.

A client MUST NOT generate a 100-continue expectation in a request that does not include content.

We should definitely follow this. No expect unless there's a body.

A client that will wait for a 100 (Continue) response before sending the request content MUST send an header field containing a 100-continue expectation.

✅ Doing that already.

A client that sends a 100-continue expectation is not required to wait for any specific length of time; such a client MAY proceed to send the content even if it has not yet received a response.

I think we should follow curl behavior here. Have a 1000ms default timeout and make it configurable on Agent.

Furthermore, since 100 (Continue) responses cannot be sent through an HTTP/1.0 intermediary, such a client SHOULD NOT wait for an indefinite period before sending the content.

Same as previous. But yeah. We shouldn't wait forever, even if it's not a MUST.

A client that receives a 417 status code in response to a request containing a 100-continue expectation SHOULD repeat that request without a 100-continue expectation, since the 417 response merely indicates that the response chain does not support expectations (e.g., it passes through an HTTP/1.0 server).

I think we should be good citizens and do this too, even if it's not a MUST. However on this point I can be convinced otherwise. Doesn't seem like a big deal though.


Whether the work is added on to this PR or a new PR, I don't think matters much. Either way I think we need to do this before shipping a new version with this functionality.

Just throwing out there: Should we do like curl and always send expect-100 headers and wait 1000ms when we have content?

@johan-bjareholt You already done a lot here. Thanks! It's up to you whether you want to take on these further requirements. Let us know if you intend to continue, or we will work out a plan for the rest of the work. No pressure!

@johan-bjareholt
Copy link
Contributor Author

Whether the work is added on to this PR or a new PR, I don't think matters much. Either way I think we need to do this before shipping a new version with this functionality.

The pro of not merging it would be that then there wouldn't be a hurry to fix it before the next release, as it doesn't become a blocker. If we don't find it to be complete enough for a release, maybe we shouldn't merge it?

Just throwing out there: Should we do like curl and always send expect-100 headers and wait 1000ms when we have content?

To do it unconditionally seems like it would be against the point of what "Expect: 100-continue" is supposed to solve, possibility to deny requests early that are large to not waste network bandwidth and processing? To do it if the request payload is bigger than X megabytes would seem reasonable to me however.

@johan-bjareholt You already done a lot here. Thanks! It's up to you whether you want to take on these further requirements. Let us know if you intend to continue, or we will work out a plan for the rest of the work. No pressure!

Thanks, I highly appreciate the fast and thorough support!

I would love to help out some more, but the coming week will be a bit busy for me. So if you have some patience, I could continue working on this. For my use-case, the current solution works but the suggested improvements would also help.

@jsha
Copy link
Collaborator

jsha commented Nov 13, 2023

The pro of not merging it would be that then there wouldn't be a hurry to fix it before the next release, as it doesn't become a blocker. If we don't find it to be complete enough for a release, maybe we shouldn't merge it?

Good point. I agree.

Just throwing out there: Should we do like curl and always send expect-100 headers and wait 1000ms when we have content?

To do it unconditionally seems like it would be against the point of what "Expect: 100-continue" is supposed to solve, possibility to deny requests early that are large to not waste network bandwidth and processing? To do it if the request payload is bigger than X megabytes would seem reasonable to me however.

Yeah, I agree here too. Here's what the curl docs say:

curl sends this Expect: header by default if the POST it will do is known or suspected to be larger than just minuscule. curl also does this for PUT requests.

I like this idea. We could do this for the known-length send methods like send_bytes. For send we won't know if the body will be miniscule or not. We could assume that send will always be non-miniscule because the point of passing an impl Read is that the data be too big or too dynamic to fit in a &[u8] easily.

I would love to help out some more, but the coming week will be a bit busy for me. So if you have some patience, I could continue working on this. For my use-case, the current solution works but the suggested improvements would also help.

We're happy to wait. Let's say if you come back to in the next few weeks, great; if not we'll pick up your PR and run with it. Thanks!

@johan-bjareholt
Copy link
Contributor Author

I have unfortunately been swamped the past few weeks so have not been able to work on this. Neither do I see myself having time to start working on the rest before the middle of february. If someone else has time to look at it, that'd be much appreciated.

@johan-bjareholt
Copy link
Contributor Author

Since this is unfortunately taking so long, would it be an option to merge this and create a new issue for the things missing?
Considering that this feature is only enabled if you explicitly set the header, it shouldn't break anything.

@johan-bjareholt
Copy link
Contributor Author

Tried to rebase to fix merge conflict, but tests are failing on main again.

Pushed fix for main in seperate PR #742

There does not seem to be any good reason to take ownership of it. Makes
us able to remove a clone and a TODO comment.
src/response.rs Outdated Show resolved Hide resolved
@johan-bjareholt johan-bjareholt force-pushed the expect-100-continue branch 2 times, most recently from b5244bd to 0daee34 Compare March 26, 2024 10:47
src/unit.rs Outdated Show resolved Hide resolved
@johan-bjareholt
Copy link
Contributor Author

Added some unit tests, implemented support for handling 417 according to spec and added a TODO in the PR description.

@johan-bjareholt johan-bjareholt force-pushed the expect-100-continue branch 2 times, most recently from 09c739a to 5419ffd Compare April 11, 2024 08:06
match response.status() {
100 => debug!("Got 100-continue, proceeding with body"),
200 => {
// TODO: How should we handle this case?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how to best handle this case?

If a server does not understand the "Expect: 100-continue" header, it
will wait for the body indefinitely. To solve this issue, we add a
shorter timeout on reading the response status+headers and if that
timeout is hit we send the body anyway.
@johan-bjareholt
Copy link
Contributor Author

@algesten @jsha Feel free to review this again, I think it has all the critical features now.

@johan-bjareholt
Copy link
Contributor Author

@algesten @jsha Ping 🙂

@algesten
Copy link
Owner

@johan-bjareholt I haven't forgotten. Just been a lot lately.

@johan-bjareholt
Copy link
Contributor Author

Sorry for nagging, just wanted to ping this again.
I understand if you still got a lot to do.

@algesten
Copy link
Owner

I'm not ignoring this. I'm stalling! 🙈

@algesten
Copy link
Owner

To elaborate: ureq is now a quite popular crate, and I'm largely alone in maintaining it. This PR changes some of the inner workings and I've increasingly become more and more hesitant to do such things (don't know if you saw the fallout from trying to fix our test cases with hootbin)

Which isn't to say we don't want it, but I have this inertia/emotional block to get over.

@johan-bjareholt
Copy link
Contributor Author

I understand.
Anything I can help to get you to overcome it? Any code that these changes might impact that we need more unittests for?
I've been using this patch continuously for a few months now without any issues at least with over 160 million requests and terrabytes of data, if that's gives any comfort. My use-case might be very specific though.

@algesten
Copy link
Owner

@johan-bjareholt

The ureq 3.x rewrite which now is in main should support expect-100-continue. Sorry for messing you around on this PR.

@algesten algesten closed this Oct 12, 2024
@johan-bjareholt
Copy link
Contributor Author

It's ok, what's most important is that it's now supported. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants