Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul HTTP Accept (mime type) handling in Pyramid #3326

Merged
merged 24 commits into from
Oct 15, 2018

Conversation

mmerickel
Copy link
Member

@mmerickel mmerickel commented Aug 8, 2018

@bertjwregeer does this look right? If so I can probably fix the tests.

fixes #3317

@mmerickel mmerickel added this to the 1.10 milestone Aug 8, 2018
pyramid/predicates.py Outdated Show resolved Hide resolved
@mmerickel mmerickel force-pushed the fix-deprecated-accept-predicate branch from 575586c to c49e135 Compare August 16, 2018 05:27
@mmerickel
Copy link
Member Author

@whiteroses I pushed an updated version that uses just media types as an example. I'm afraid this is a big bw-incompatibility though so that needs to be considered carefully.

@whiteroses
Copy link
Contributor

Thanks @mmerickel. I made three branches to help us see what changed in WebOb and what needs to change in Pyramid:

https://github.com/whiteroses/pyramid/tree/accept-arg-current-behavior:
This branch has some tests that show what currently happens when we use media ranges in the accept argument to add_route and add_view.

https://github.com/whiteroses/pyramid/tree/switch-to-acceptable_offers:
Following on from accept-arg-current-behavior, I added your changes involving the switch to acceptable_offers, but without changing to accept only media types, to show the tests that would fail.

https://github.com/whiteroses/pyramid/tree/before-webob-changes
This is branching off Pyramid's 1.9-branch, so that we can pin WebOb to 1.7.4, before the GSOC changes. Then I added the tests for media ranges from the accept-arg-current-behavior branch, to show that the use of media ranges was already broken for MIMENilAccept. The call to _check_offer() — the reason you got the ValueError here -- was removed in the GSOC changes, which fixed the ValueError and the inconsistency with the behaviour of MIMEAccept and the documentation; but the use of media ranges for the accept argument remained ambiguous, as explained here and can be seen in some of the tests I added (test_header_ruled_out_by_specific_media_type_q0, test_header_ruled_out_by_type_range_q0, and test_header_ruled_out_by_all_range_q0). (As the ValueError would have been raised whenever the Accept header was not present in the request or was empty and the accept argument had an asterisk in it, and no one had raised this issue as far as I know, does that mean that no one was really using media ranges with the accept argument?)

There is more information in some of the commit messages. Please let me know if there is anything I can clarify!

pyramid/predicates.py Outdated Show resolved Hide resolved
@whiteroses
Copy link
Contributor

Re: bw-incompatibility, we could use the old best_match and __contains__ only if we find a '*' in the accept argument (so that we only get the DeprecationWarning from #3317 if the acceptargument has a '*' — which we could catch and translate to a Pyramid DeprecationWarning), but use acceptable_offers otherwise?

@mmerickel
Copy link
Member Author

mmerickel commented Aug 26, 2018

Here is my current plan:

  • deprecate raise on any usage of * in the AcceptPredicate but if it's there then fall back to using __contains__ as before.
  • [ ] keep the old behavior in MultiView where it only supports negotiation if the accept value does not include * (otherwise it punts to __contains__ later in the AcceptPredicate).
  • add config.add_accept_order(media_type, weighs_more_than=..., weighs_less_than=...) which will build up an IAcceptOrder object on the registry with a sorted list of media types.
  • modify MultiView to sort the view offers based on IAcceptOrder before passing the offers into request.accept.acceptable_offers() so that the selected views are sorted based on user preferences (instead of the currently random behavior which causes issues when the client specifies missing/invalid Accept header, or the client is ambiguous such as Accept: */*).

future enhancements:

  • come up with a way for the MultiView to short-circuit the AcceptPredicate if the MultiView already did all the work, such that AcceptPredicate can be a no-op instead of checking it again (this double checking currently happens in Pyramid in this scenario).
  • a future PR based on this work can add support for a list of media types being passed to AcceptPredicate (such as accept=['text/html', 'application/xml+xhtml']) as a simple way to get around using media ranges.
  • add some type of support for predicates to return a 2-tuple of (True, info) which would be set on the request only if the view was selected (something like request.view_predicates[predicate_name] = info).
  • use the request.view_predicates support to tell the view which accept media type was selected.

@mmerickel
Copy link
Member Author

mmerickel commented Aug 27, 2018

@whiteroses As I get further into this I think media ranges are an acceptable thing to use for add_route and am concerned that there's no way to do that anymore in webob. The use case in routes is specifically the __contains__ feature where we simply want to know if the request allows any response satisfying a media type or media range. There is no content negotiation here. How do I do this in webob now without using the deprecated __contains__ method?

I think at this point I firmly believe that __contains__ should be undeprecated or acceptable_offers should be updated to order/match media ranges (even if it always puts them at a lower precedence than media types or something). I can’t find anything in the RFC that would prevent this and regardless we need something to check if a range is considered acceptable by the header.

@whiteroses
Copy link
Contributor

whiteroses commented Aug 27, 2018

What does it mean to "check if a range is considered acceptable by the header", given that the idea does not exist in RFC 7231? The Accept header is used "to specify response media types that are acceptable", not media ranges; media ranges are only used in the header, to specify the range of media types that are acceptable.

Say the accept argument to an add_route is 'text/*'. The Accept header coming in is text/html;q=0, */*: every media type is acceptable except text/html. Should we consider that a match?

This is the same scenario I mentioned in the second part of the comment here, and applies just as much to add_route as to add_view.

The old MIMEAccept._match invented its own algorithm that is not in any of the RFCs — it didn't just accept media ranges, but also just '*'. It did not handle 'q=0's properly: __contains__ used ._parsed_nonzero — so in the scenario I mentioned above, it would respond as if the header were simply '*/*'.

As mentioned before, if the Accept header was not present or was the empty string, a ValueError would be raised — and yet no one had complained. So were there many people actually using '*' in the accept argument to add_route or add_view, and how did they not run into this ValueError?

I don't understand why Pyramid would want to borrow the media range concept, extend it by adding '*' on its own, create its own algorithm so that instead of media ranges being only found in the header as in the RFC, make it so that it could be in both the header and the accept argument to add_route — such that Pyramid would then have to define the meaning of how media ranges on both ends would interact (with the added complication of 'q=0's), and create its own extension to the already complicated and underspecified RFC?

@digitalresistor
Copy link
Member

digitalresistor commented Aug 27, 2018

One of the issues is that add_route can be used for many views, those views may limit the accept to one or more types, but they should be well defined.

Think something like add_route(accept='text/*') which has two views: add_view('viewA', accept='text/html') and add_view('viewB', accept='text/other') in the case of an Accept: text/html;q=0, */* we would still want the route to match, and then let the views decide which one is actually acceptable, in this case only viewB.

I don't think it'll be hard to add support for something/* in acceptable_offers and we can put it behind a flag that has to be true if necessary.


Reading about what you are saying about __contains__ calling _check_offer, I'm going to see if I can write a quick test in the pyramid test suite that we can use to reason through what is expected.

The question I have, does anyone rely on the ability to pass whatever/* to accept as a predicate?

@whiteroses
Copy link
Contributor

@bertjwregeer So if I understand correctly, the '*' in the accept argument to add_route would mean a match if the range in the argument has not been entirely ruled out by the header, e.g. if the header is text/html, text/other;q=0 and the accept argument to add_route is text/*, then that would be a match?

Wouldn't not specifying the accept argument to add_route at all get us the same result?

And if there is a case where this is needed, would it not be something very much Pyramid-specific, and belongs in Pyramid rather than WebOb?

@mmerickel
Copy link
Member Author

mmerickel commented Aug 28, 2018

Ok thanks for your patience @whiteroses, my motivation has been to try and do what makes "sense" while also trying to figure out and maintain compat with older versions of Pyramid.

  • The Pyramid docs clearly accept media ranges, but I tried going all the way back to webob 1.0.8 and I still get the same error ValueError: The application should offer specific types, got 'text/*' in all versions of webob PRIOR TO 1.6.
  • In versions 1.6 and 1.7 media ranges are accepted so long as the client passes a valid accept header, otherwise they will raise.
  • In webob 1.8 they also are accepted and treated as media types instead of media ranges.

That being said, I have found some other odd things going on with webob's accept parsing in both old versions and the new version. For example, behaviors are quite different on "invalid" media types when passing them as offers to invalid/missing/valid accept headers. Some raise a ValueError and some are pass through and this inconsistency is not good if we want people to use this feature in robust apps. If the app offers an invalid media type then it should raise consistently in each scenario. I opened the below issues to get these fixed in webob.

Pylons/webob#365
Pylons/webob#366
Pylons/webob#367
Pylons/webob#368
Pylons/webob#369

From here, I think I now have to balance the fact that the Pyramid docs were always wrong and magically started being correct (sometimes - based on the client's value of the accept header) in webob 1.6/1.7 and 1.8 preserved that behavior of not raising but changed the reason why by treating them as media types instead of ranges (this should raise imo).

My proposal is to get these issues fixed that I've opened in webob and then continue with my above solution with the following tweak:

  • I think we should just start raising in Pyramid at config-time to make things consistent and predictable. Media ranges were not supported prior to webob 1.6 and I think basically everyone seems to agree that they should not be supported. So let's just do that.

With this tweak, we will be resilient to whatever webob decides to do in 1.8+ (right now it treats ranges as media types and doesn't raise, and I propose it gets fixed to raise) - but regardless Pyramid will raise.

For reference, here is the app I was playing around with using the various versions of webob/pyramid with route predicates:

from pyramid.config import Configurator
from waitress import serve

config = Configurator()
config.add_route('home', '', accept='*/*')
config.add_route('fallback_home', '')

def home_view(request):
    print(request.accept)
    return request.matched_route.name
config.add_view(home_view, route_name='home', renderer='string')
config.add_view(home_view, route_name='fallback_home', renderer='string')

app = config.make_wsgi_app()
serve(app, listen='localhost:6543')

and here is the app I was using to test view predicates:

from pyramid.config import Configurator
from waitress import serve

config = Configurator()
config.add_route('home', '')

def make_view(response):
    def view(request):
        print(request.accept)
        return response
    return view
config.add_view(make_view('default'), route_name='home', renderer='string')
config.add_view(make_view('*/*'), route_name='home', renderer='string',
                accept='*/*')
config.add_view(make_view('text/html'), route_name='home', renderer='string',
                accept='text/html')

app = config.make_wsgi_app()
serve(app, listen='localhost:6543')

@mmerickel mmerickel force-pushed the fix-deprecated-accept-predicate branch from c49e135 to 1754187 Compare August 28, 2018 07:24
@mmerickel
Copy link
Member Author

I've pushed an updated impl based on the above discussion.

@mmerickel mmerickel force-pushed the fix-deprecated-accept-predicate branch from 1754187 to 121f45b Compare August 28, 2018 07:36
@mmerickel
Copy link
Member Author

If people could look at the Webob issues I've opened and weigh in it would help me greatly in determining how to integrate this into Pyramid. The invalid headers not causing a fallback to the view with no accept predicate I think is a pressing issue right now to the extent that if Webob doesn't want to fix it then I'll probably have to check manually if it's an invalid header and handle it on my own versus relying on acceptable_offers.

@mmerickel
Copy link
Member Author

After talking to @bertjwregeer I think we're happy with the invalid headers being treated as "accept anything" just like a missing header (which is defined by the RFC to accept anything). This basically means that any view with accept=None will have lowest priority and will only execute if all other accept=foo views fail on other predicates.

With that being said, I think the implementation here is working and correct and I just need to fix up the tests.

Copy link
Member

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the docs look good to me, with a couple of suggestions.

docs/narr/viewconfig.rst Outdated Show resolved Hide resolved
docs/narr/viewconfig.rst Outdated Show resolved Hide resolved
@mmerickel mmerickel force-pushed the fix-deprecated-accept-predicate branch 2 times, most recently from 2118ab3 to 247067f Compare August 29, 2018 14:34
@mmerickel
Copy link
Member Author

mmerickel commented Aug 29, 2018

Well, I just saw #1407 which makes me question again whether this is a good idea to limit ourselves to media types only as the best_match behavior in webob had already been changed once to support media ranges (Pylons/webob#185), and now acceptable offers has gone backward to only support media types. We cannot keep flip flopping on this behavior. This also makes it clear to me that some people are relying on media range support in webob 1.6/1.7 which was deprecated in webob 1.8.

@mmerickel
Copy link
Member Author

ping @ltvolks as well, as the brains behind #1407.

Copy link
Member

@stevepiercy stevepiercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about "XXX".

@wichert
Copy link
Member

wichert commented Oct 12, 2018

I was hoping that this PR would introduce some general logic that we can use for language negotiation as well. Currently WebOb has deprecated that functionality, but nothing has replaced it as far as I can see.

@mmerickel
Copy link
Member Author

mmerickel commented Oct 12, 2018

@wichert I can only imagine you missed the massive number of new features in webob 1.8 done by @whiteroses for GSOC last year that resulted in deprecating the old functionality and enabled me to write this PR. Is there something you were hoping to see in Pyramid specifically?

@stevepiercy
Copy link
Member

Is issue #2807 related to language negotiation?

@mmerickel
Copy link
Member Author

mmerickel commented Oct 12, 2018

@wichert I see, I missed the language part of your comment. It'd be interesting to explore algorithms to combine language, mimetype, charset, etc but that is not this PR. This PR is fixing long-standing complaints with just mimetype negotiation. Webob itself certainly introduced some nicer features around language that Pyramid is not using in any way by default right now. Please open separate issues in Pyramid for extending the content negotiation if you want to see that happen.

@wichert
Copy link
Member

wichert commented Oct 12, 2018

@bertjwregeer The relevant method is best_match(), which WebOb has deprecated without a useful replacement. The documentation is inconsistent with the reality in that it says the method will be deprecated, but if you use it you already do get a deprecation warning in your output.

@mmerickel I haven't looked at the details, but I was expecting it to be possible to have a single matching function which can be used for language, mime type, transfer encoding, etc. which could act as a basis for this work. That might have been naive from me - I certainly haven't looked at the relevant RFCs.

@mmerickel
Copy link
Member Author

@wichert request.accept_language.lookup is documented as the replacement for best_match. If that api isn’t sufficient you should open an issue in webob

With respect to a full blown matching algorithm that combines all the accept headers you should also open an issue in webob but afaik there is not a well defined algorithm to do that.

Regardless, please try to keep this issue and review focused on the work done with the accept header.

@digitalresistor
Copy link
Member

There's no defined ordering/matching for all of the relevant Accept headers... Not sure how that would even work, because language matching has no relation to the mime-type of the returned document.

I am not even sure how Accept-Encoding would play into this, your application really shouldn't care about what transfer encoding is accepted by the remote client (unless you have pre-compressed assets on disk, maybe then).

Open an issue in WebOb if you want, but it will sit there forever, unfixed as there is no way to do what you want it to do.


As for best_match being deprecated, that's the way we deprecate things. We don't just docs deprecate it, we make it warn about deprecation and make a note of it in the documentation. It will be removed entirely in a future version of WebOb. There's two useful (and much better) replacements, I linked to the documentation on both of them.

@digitalresistor
Copy link
Member

WebOb 1.8.3 has been released that contains the fixes necessary for this work to be released :-).

@digitalresistor
Copy link
Member

One of the problems is that the title of this PR doesn't make it clear that this is only for dealing with the Accept header, not all Accept-* headers.

Sadly instead of calling it Accept-Type or something along those lines they just called it Accept.

@digitalresistor digitalresistor changed the title overhaul accept handling Overhaul Accept (mime type) handling in Pyramid Oct 15, 2018
@digitalresistor digitalresistor changed the title Overhaul Accept (mime type) handling in Pyramid Overhaul HTTP Accept (mime type) handling in Pyramid Oct 15, 2018
@mmerickel mmerickel merged commit 433efe0 into Pylons:master Oct 15, 2018
@mmerickel
Copy link
Member Author

@wichert I think all of the accept headers affect different parts of the stack, it's unlikely you want one content negotiation step to rule them all.

  • Accept-Encoding is usually handled by middleware.
  • Accept-Language is usually handled by the template and anything using request.localizer.
  • Accept-Charset is stupid imo and is handled better by just accepting specific media types via Accept: text/html;charset=utf8 etc since it only really applies to specific text-based media types anyway.
  • Accept is tied to which content type to return (as well as Accept-Charset) and thus belongs within the renderer itself, or the in the context of this pull request, used to select which view/renderer to use.

My point is that I think there's just not a situation where we would want to look at multiple accept headers at the same time and certainly it's not clear enough to framework it.

@digitalresistor
Copy link
Member

I know of no UA that sends an Accept-Charset by default. It too could be used for middleware though to translate from one charset to another on the fly so the client gets what they want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DeprecationWarning from WebOb 1.8.2 (Accept parsing uses old method)
6 participants