Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INFRA-3100] Migrate updates.jenkins.io to another Cloud #2649

Open
Tracked by #3662
jenkins-infra-bot opened this issue Oct 15, 2021 · 165 comments
Open
Tracked by #3662

[INFRA-3100] Migrate updates.jenkins.io to another Cloud #2649

jenkins-infra-bot opened this issue Oct 15, 2021 · 165 comments

Comments

@jenkins-infra-bot
Copy link

jenkins-infra-bot commented Oct 15, 2021

Why

Read the EPIC (aws cost decrease: #2646)

What

  • updates.jenkins.io serves the JSON of the update center, but it is not behind a CDN (need to be updated regularly) causing a lot of outbound data transfer (2.4 Mb for a JSON download, esitmation around 300 Gb of data sent daily, estimation around 10 Tb per month which costs some $$$.
  • The VM pkg.origin.jenkins.io is managing multiple services and separting them could be a great help.

How

Different paths can be taken here: to be discussed in infra meeting + validated by the board as it's an important service.

  • Migrating the service to Oracle Cloud:
    • the first 10 Tb of outbound data is free, and if the worst case, 50 Tb of outbound would cost less than 3k $ per month - https://www.oracle.com/be/cloud/networking/networking-pricing.html
    • ARM machines: cheaper and better performance for a simple webserver + SSH + data
    • Block storage: close to archives.jenkins.io
    • Risk: Oracle sponsoring program is only 1 year, and is not easily secured for paiment as other account
  • Migrating to Azure
    • Easily in AKS: easier management
    • Safer paiment process
    • Risk: cost of outbound to be evaluated

Originally reported by dduportal, imported from: Migrate updates.ci.jenkins.io to another Cloud
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10

[note]

@dduportal
Copy link
Contributor

dduportal commented May 10, 2022

Requires setting up the Oracle Terraform project, like #2682

Todo list:

  • Specify a new infra (VM + data storage of the same size as the actual pkg.origin.jenkins.io machine + any other Oracle infra requirements). Implies checking the pricing to get an overall idea
  • Once VM is created, add it to puppet management: new node, with the role / profiles associated to "update.jenkins.io"
  • Rsync the data one time
  • Update the jenkins-infra/update-center2 's associated job in trusted.ci to update both "pkg.origin.jenkins.io" VM and the new one (⚠️ inline pipeline, don't search for a Jenkinsfile as code)
  • Validate the new instance (including checking with Daniel, Tim and other contributors)
  • Communicate about the upcoming change
  • As proposed by Stephane, use a round-robin DNS record value to split traffic between old/new during some time and check for error

@dduportal
Copy link
Contributor

Blocked by #2973

@smerle33
Copy link
Contributor

actual machine size is :
32Gb RAM
8cpu
1,2Tb data disk (372Gb free)

about half of the power is used currently (checked with the local SAR probe)

@smerle33
Copy link
Contributor

Infra to specify :

  • 1 VM (similar to archive.jenkins.io)
  • 1 network ("mirrors") + 1 subnet ("20210630-1531")
  • 1 volume 1,2Tb
  • 1 set of security groups to restrict network access
  • 1 ssh key pair

@smerle33
Copy link
Contributor

VM specifications :

  • 4vCPU/16Gb RAM : half of actual machine
  • proposal to use ARM like archive.jenkins.io --> 4ocpu (VM Type Standard A1 Flex)
  • Image Ubuntu 20.04 (FULL [non minimal]) --> upgrade from ubuntu 18.04 for actual machine
    • option for 22.04 but need testing for puppet agent (+ openssl v3)

@dduportal
Copy link
Contributor

Fifth brownout finished! (ref. jenkins-infra/azure-net#309)

@dduportal
Copy link
Contributor

dduportal commented Oct 29, 2024

Fifth brownout finished! (ref. jenkins-infra/azure-net#309)

Results of the fifth brownout:

  • ✅ The 2 last blocker issues did not come back (both HTTP/404 of different kinds)
  • ✅ Still working for the main usages (JSON files, base HTML pages)
  • Metrics shows the same usage on Cloudflare (reminder we did not have access to detailed logs):
Click to view Cloudflare aggregated metrics

Capture d’écran 2024-10-29 à 11 33 05

  • ⚠️ We still have a lot of HTTP/404 answers. Rate decreased a bit but it is still more than half of the requests received by Cloudflare. => not a blocker but annoying

  • ✅ The successful requests on Cloudflare maps to the current Update Center (~ 3M hits for ~2.5 Tb downloaded)

  • ⚠️ We still have user-facing reported HTTP/404 on the "directory listing" pages (e.g. requests to a directory without an index.html). => not a blocker but annoying

    • Apache (in the current VM pkg) has directory listing and generates the index for us.
    • While in the new system, Apache redirects these requests to the cloudflare mirrors which do not provide directory listing feature so ends in HTTP/404.
    • See comment below for details
  • ⌛️ The update center2 generation jobs runs in 5 to 7 min, but is only cron-triggered every 10 min. We should work on this. => not a blocker but annoying

  • ⚠️ The update center2 should copy the Apache www-redirects content as the last stage. Not important or blocker, but avoids the 3-4 min time window with HTTP/404 errors when a new update center JSON versions is deployed (usually during a new Core release). => not a blocker and low priority

@dduportal
Copy link
Contributor

⚠️ We still have user-facing reported HTTP/404 on the "directory listing" pages (e.g. requests to a directory without an index.html). => not a blocker but annoying

* Apache (in the current VM `pkg`) has directory listing and generates the index for us.

* While in the new system, Apache redirects these requests to the cloudflare mirrors which do not provide directory listing feature so ends in HTTP/404.

* See comment below for details

From now, we have the following choices:

  • Keep the current behavior (e.g. HTTP/404 for requests targeting directory listing pages) and accept it is not a feature we want to keep
  • Generates the directory listing on each mirror:
  • Provides the files in the existing front Apache
    • Require to change the "fallback to mirror" rewrite rule. It is doable now: we have fixed the redirection priorities we had in the past weeks. Need to catch the exact URIs to send to mirrors
    • Would decrease the amount of requests (mostly error-ed requests) sent to cloudflare
    • Would increase the amount of requests served by our Apache. Let's keep in mind this content is not the initial "heavy bandwidth target" though.

@dduportal
Copy link
Contributor

⚠️ We still have user-facing reported HTTP/404 on the "directory listing" pages (e.g. requests to a directory without an index.html). => not a blocker but annoying
...

Discussed during today's team meeting. We are proceeding with the last solution, following @daniel-beck 's recommendation.

E.g. we're updating the architecture to have Apache serving everything except the JSON files. We can do this now as we've solved the "Apache Redirections": it wasn't possible in the past months with the RedirectMatch directives.
=> it most probably will simplify the publish.sh process in update center2 job as well

@dduportal
Copy link
Contributor

⚠️ We still have user-facing reported HTTP/404 on the "directory listing" pages (e.g. requests to a directory without an index.html). => not a blocker but annoying
...

Discussed during today's team meeting. We are proceeding with the last solution, following @daniel-beck 's recommendation.

E.g. we're updating the architecture to have Apache serving everything except the JSON files. We can do this now as we've solved the "Apache Redirections": it wasn't possible in the past months with the RedirectMatch directives. => it most probably will simplify the publish.sh process in update center2 job as well

  • PR on the current UC: chore(wrappers - publish.sh) only send JSON files to mirrors update-center2#816 (ping @daniel-beck could i get your brain on this one? I don't mind scheduling a 1:1 meeting with you if it helps)
  • Todo next (if it works as expected on a manual test):
    • Update crawler publication script to also publish to httpd (to have directoryt listing on /updates/)
    • Update mirrors.updates.jenkins.io ingress to stop rewriting directories to index.html

@dduportal
Copy link
Contributor

dduportal commented Oct 31, 2024

@dduportal
Copy link
Contributor

Update:

@smerle33
Copy link
Contributor

smerle33 commented Nov 5, 2024

Update (preparing the 6th brownout):

@dduportal
Copy link
Contributor

Update:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests