Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETag should consider CORS headers #5986

Open
IchordeDionysos opened this issue Sep 25, 2024 · 5 comments
Open

ETag should consider CORS headers #5986

IchordeDionysos opened this issue Sep 25, 2024 · 5 comments
Labels

Comments

@IchordeDionysos
Copy link

TL;DR: I want the ETag to change when the access-control-allow-origin header is different but the response body is the same.


We are facing an issue right now with a couple moving parts working poorly together.

It's a bit difficult to explain but I'll try my best.

image
  1. When sending a request from Chrome to a spec-compliant CDN, it will initially hit the express server.
    1. The express server responds with the resource, CORS & Vary headers, and an ETag derived from the response body.
    2. The CDN will cache the response headers and body.
    3. Chrome will also cache the response headers and body.
  2. When Chrome requests the resource from a different origin, it may take the ETag cached for a different Origin and send it to the CDN (this is spec-compliant).
    1. This may then lead the CDN to respond with a 304 Not Modified, as the CDN may have a resource for the origin and ETag in the cache.
    2. Responding with 304 will only return partial headers (explicitly excluding CORS headers; Some spec says it should NOT return CORS headers here).
  3. This then leads Chrome to take the response from the cache and think that the Origins mismatch → throwing a CORS error.

This is kind of messed up, as all parties do comply with the specs, though when put together, the specs don't work nicely together.

The best way to fix this would be for the server to generate a different ETag when the access-control-allow-origin header differs.

@Sealos
Copy link

Sealos commented Sep 26, 2024

This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.

If I understood this correctly, your assumption is that GET requests from different origins should return different headers, if that is true then consider in your implementation:

  • Different routes for different origins (Or query parameters)
  • Cookie-based caching
  • Make CORS requests return Access-Control-Allow-Origin: * instead of origin if the origin is allowed to access the resource
  • No CDN cache on these cases, handle the cache in the application

@IchordeDionysos
Copy link
Author

IchordeDionysos commented Sep 26, 2024

If I understood this correctly, your assumption is that GET requests from different origins should return different headers.

Yes that's the assumption and the solution to handle caching in that case would be to return a Vary: Origin header, which we do but in connection with other standards this doesn't play nicely together.

  • Different routes for different origins (Or query parameters)
    → This is not a very clean solution ... as it will make the usage for consumers more complicated while it could be solved on the server-side with different ETags.

  • Cookie-based caching
    → This won't work as a CDN will heavily speed up the application, separate requests by cookie would defeat the purpose of the cache.

  • Make CORS requests return Access-Control-Allow-Origin: * instead of origin if the origin is allowed to access the resource
    → For security reasons this is not a good solutions, especially if you need to use credentials in requests.
    Returning access-controll-allow-origin: * will be prohibited.

  • No CDN cache on these cases, handle the cache in the application
    → Again this would not be a good solution as CDN cache speeds up the application significantly!

@IchordeDionysos
Copy link
Author

This is not an issue with Express and would happen with any HTTP framework, as the CDN caches the headers & content.

Regarding this comment. It is indeed expected and not the issue that the CDN caches the headers and the content.

It is particularly this combination of the browser using a different ETag from a request cached for a different origin that doesn't play nicely with the feature.

@NewEraCracker
Copy link

I'd say remove the ETag from those responses. Either using middleware or disable for all.

Add The “Vary: Origin” header for security: https://www.google.com/search?q=Vary+cross-origin+header

When a browser makes a CORS request, it automatically adds an “Origin” header to indicate the source of the request. The server must check this header and add an “Access-Control-Allow-Origin” header to indicate which sources can access the requested resource.

If a request may contain a Access-Control-Allow-Origin with different values, then the CDN should always respond with Vary: Origin, even for responses without an Access-Control-Allow-Origin header.

If the header isn't always present, it would be possible to fill the cache with incorrect values.

My two cents.

@AdamNorberg
Copy link

From my own dive into the RFCs, this is a nasty problem with how the 304 response is defined and how headers work in HTTP caches, including caches inside web browsers:

  • RFC 9110: Hypertext Transfer Protocol
    • 8.8.3: ETag
      • Opaque validator for different representations of the resource
      • Headers are not part of the resource -- if the ETag had to change when the headers changed, then the ETag would be useless in letting a cache extend the lifespan of a stale cached response because the new expiration time would force a new ETag, even though there is no theoretical reason to transmit the entire body again
    • 15.4.5: 304 Not Modified
      • Appropriate response for a conditional GET or HEAD when the response would be a 200 (OK), except the request indicated that the client already had "a valid representation" of "the target resource", and the client must use its stored representation as if it were the content of a 200.
      • Lists headers a server MUST generate in a 304 and states that it SHOULD NOT send any others "unless said metadata exists for the purpose of guiding cache updates".
        • Must include Content-Location, Date, ETag, Vary, Cache-Control, and Expires.
        • Does not list anything related to Access-Control, including Access-Control-Allow-Origin or Access-Control-Allow-Certificate. Does not, in any way, refer to headers implicated by the Vary field.'
  • RFC 9111: HTTP Caching
    • 3.1: Storing Header and Trailer Fields
      • Cache all the headers, including the headers the cache doesn't understand or care about, except for a handful of specific headers about the connection itself, for proxy-to-cache negotiation, or those explicitly named in cache directive headers as "exclude from cache".
      • Nothing in particular is said about Access-Control-Allow-Origin, so it must be cached.
    • 4: Constructing Responses from Caches
      • Only use the cached content if the URI matches, the header fields match, the response isn't uncacheable (unless it passes validation), and the stored response is fresh, it validates, or can be served stale.
      • Always use the most recent version available.-
    • 4.1: Calculating Cache Keys with the Vary Header Field
      • Somewhat contradicts the story in 4 -- while 4's "unless it passes validation" is specifically applied to the "don't cache" header and "expired", 4.1 explicitly describes Vary also allowing revalidation.
      • If the cache has a mix of responses for the URI, some of which have Vary headers and some of which don't, it should prefer responses with a "valid" Vary field value, making the responses with no such header functionally useless in the cache.
    • 4.3.1: Sending a Validation Request
      • Synthesize a request by copying the method, URI, and those specific headers identified by Vary, then add preconditions to solicit 304 responses if the answer matches stuff already in the cache
    • 4.3.2: Handling a Received Validation Request
      • If conditions + your data say you can serve a 304, serve a 304
    • 4.3.4: Freshening Stored Responses upon Validation
      • "Update" means "replace the headers in the stored response" -- see RFFC 9111 3.2, which says "add new headers, overwrite existing headers, follow the exceptions in 3.1, if you're patching together different requests or adding/removing compression then update those parts to match"
      • Update those stored responses that have matching validators

The upshot to all this:

  • A strictly standards-compliant server giving a 304 response should omit the Access-Control-Allow-Origin header; thus, an intermediate cache receiving updated Access-Control-Allow-Origin headers during successful revalidation, and subsequently sending a 304 of its own, is expected not to propagate that updated header
  • A strictly standards-compliant cache receiving a 304 response should update its cached headers for every response it stored on that path with matching validators; thus, a cache at any level (including a web browser) receiving updated Access-Control-Allow-Origin headers should rewrite the header stapled to cached prior requests from other origins

Therefore, ACAO with values other than * can only successfully transit caches if every single cache along the way is strategically violating the spec:

  • It has to send updated ACAO headers anyway
  • It must avoid writing "updated" ACAO headers back to cache rows pertaining to Vary: origin requests with different origins, even when the validators match

If it doesn't do both of these things, then at some point a downstream client is going to process a response with an ACAO header that doesn't match the Origin and treat this as a security error. If the ACAO headers get dropped along the way, a request coming in from a new-but-acceptable origin will not see the new ACAO response specifying the origin from the request. If the ACAO headers get propagated, and written back to cached rows as the spec demands, then we get similarly broken results from a sequence of requests: request from origin A, miss cache, receive 200 (cache it), propagate response; request from origin B, revalidate, receive 304 (additional cache row sharing content, update headers), patch cached body onto new ACAO header; request from origin A, hit cache, respond with the cached response and headers -- which now includes the ACAO header from origin B because the cache must use the most recent headers.

It's not hard to imagine a way to do this "right": always send the ACAO header and don't cross-update headers for other cache rows pertaining to the same request path when those requests varied in some relevant dimension (according to the Vary header). This violates the spec but works as intended. If every cache along the way chooses to violate the spec in this way, ACAO works as intended.

Patching a hash of the ACAO header onto the end of the Etag -- or similar -- avoids header cross-contamination by, morally, treating the ACAO header as part of the content, thereby forcing unchanged content to re-transfer as an undesirable side effect of HTTP caching semantics, where caches are encouraged to strip relevant headers and required to cross-update headers based on resource validators that specifically do not cover the header.

Removing the ETag does not help. Per RFC 9111 4.3.1, the cache is allowed to use timestamps instead, even when a Vary header should make everyone involved kind of sus that timetsamps don't tell enough of the story. As far as the spec is concerned, if the server wanted to be clear that the timestamp isn't enough to identify a content revision, it should use Etag...

Patching a CDN to emit ACAO headers on a 304 response does help but if there are more caches along the way (caching proxies, end-user browser caches), those caches are free to exhibit the same behaviors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants