-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Update Center] Fix HTTP/404 errors due to broken links in HTML listing pages and missing ?uctest
endpoint
#4311
Comments
Comment by @daniel-beck about the bandwidth in a discussion we got together on this topic:
=> Important point as it means we could have to change the routing pattern. Cloudflare Analytics shows that HTML was far behind in amount of requests but we can't tell the different HTML files appart: |
Proposal: Given the context of the new Update Center, let's use absolute URL links.
What are your thoughts on this @daniel-beck @timja @MarkEWaite ? |
Absolute URL makes sense to me. |
It's by far the most popular content type? How does that make any sense? Is this just the tool installers via It doesn't look like we understand enough what's going on here to base any decisions on. |
We understand the mirroring mechanism which is why i opened this issue. If we start to select files which are mirrored vs which one are not, the architectural complexity will be a pain as we will need to maintain a list of conditions. It is already nightmare-ish on get.jenkins.io tbh hence the question about pros and cons of switching to absolute URLs which is non mutually exclusive with analysing usage to understand better. the costs involved here are huge compared to optimization: but it is mandatory to have a finer grain of understanding |
Hello @daniel-beck 👋
My apologies, I mistakenly used the word "behind". You are correct, I meant that HTML seems to be, by far, the most popular type of file downloaded, at least as per the Cloudflare dashboard during the 24 hours experiment. Let me check if we see the same result on the current VM (analysing the logs from a few days ago).
I don't know. Let's compare with current behavior.
I ... don't know. We did not even know there was an HTML version of this one. Where should we look (except our access logs)? |
Initial check for the 09 October 2024 (both HTTP and HTTPS, both updates.jenkins-ci.org and updates.jenkins.io vhosts):
Report (generated with GoAccess from the "combined" access log): |
@daniel-beck If we compare with Cloudflare numbers for 24 hours, which are only HTTP/2XX and HTTP/4XX (as the redirects are NOT sent to Cloudflare), it maps:
Need to check the repartition HTML/JSON on the current production, but the high rate of HTTP/4XX clearly explains the ratio change during the brownout. It also adds more weight in using an absolute URL in the HTML generated files to decrease this amount of HTTP/4XX. |
Some of the data in the report makes no sense at all. Could you point me to the raw access logs? I want to check a few things.
The problem with this view is that there are different kinds of HTML files on this domain. The ones that this issue is about (those in https://updates.jenkins.io/download/ ) are never used programmatically unless someone's Various |
the report was generated from the access logs on the pkg machine. I used the gzipped logs with the name pattern access20241003gz. Got 4 files (unsecured and secured, for both hostnames) |
Additions:
|
Yes, but we are loosing track of the initial problem: using absolute URL in the links of these specific HTML files. Because the mirror system architecture ends up with these files server by another domain than I'm not sure to understand the relationship with the access logs or usage types: we clearly understand the problem for these specific files. What did I miss? |
As the log demonstrates, the HTML files discussed in this issue are completely irrelevant for traffic. The most popular URL that this issue is about is accessed just 24 times across the 4 logs:
Compared to:
Methodology (prove me wrong):
|
Yes, I had the same results before generating the |
I wonder whether this is necessary. Seems like mirrors make sense for anything that's actual "content" (the stuff being downloaded), not glorified directory indexes.
This came from #4311 (comment) / #4311 (comment) Basically the numbers you presented did not align with what I expected usage to look like. Looking at the actual logs shows reality lines up with my expectations :) |
Oh i see, thanks for clarifying. We agree then on the result from the current production. Let me compile my thoughts and analysis on the Cloudflare part:
@smerle33 did propose to use non Cloudflare mirror as a safety net if things goes south with CF. It would use a custom webserver we manage (or two) and hosted in DigitalOcean (we have 4-5 Tb bandwidth for free and 15k credits valids until end of year) so we can check access logs in details. Cost is OK for another brownout (assuming 2 to 3 Tb of download for 24h), but we'll need to be careful if we add it permanently. |
I met with @dduportal to move this topic along. Outcome:
|
Following this summary, I've opened the PR jenkins-infra/update-center2#812 to focus on the second solution. With the use of |
Update:
|
|
?uctest
endpoint
As described in #2649 (comment), the HTML files generated by jenkins-infra/update_center2 are using relative links.
It used to be a good technique when dealing with both domains
updates.jenkins-ci.org
andupdates.jenkins.io
in the past when they both served files.But it is now an issue in the context of the new Update Center system which uses HTTP(S) mirrors to serve content to end users to:
mirrors.jenkins-ci.org
is missing some necessary metadata files, which prevents it from being added as an apt/yum repo #3636Examples of pages:
The text was updated successfully, but these errors were encountered: