Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(blog) add a new post about the Update Center (6th brownout) #7658

Merged
merged 5 commits into from
Nov 5, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
layout: post
title: "Brownout on Update Center (updates.jenkins.io):\n 07 and 08 November 2024"
tags:
- jenkins
- jenkins-infra
- update-center
authors:
- smerle33
opengraph:
image: /images/post-images/2023/01/12/jenkins-newsletter/infrastructure.png
overview: "We'll be using the new Update Center implementation in production for 1 day"
discourse: true
---

== Summary (link:https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read[TL;DR])

Note: this is a follow up of link:/blog/2024/10/24/update-center-brownouts-5/[the 24 and 25 October 2024 24 hours brownout].

The service link:https://updates.jenkins.io will switch its implementation to a new system during 1 day:

- From Thursday 7 November 2024 09:00 am UTC until Friday 8 November 2024 09:00 am UTC

All Jenkins users are impacted but should not see any functional change.

⚠️ Please check that your organization respects the advertised DNS TTL or you might be stuck in the brownout longer than expected.

Under the hood, any HTTP request made to this service will be redirected to a mirror close to the users' location during these brownouts.

== What is the "Update Center"?

Jenkins link:https://updates.jenkins.io[Update Center] is a web server at the core of the Jenkins public infrastructure that distributes the plugins, tool installers, and versions index to all Jenkins servers across the world.

From the installation wizard to regular plugin updates, if you run Jenkins, then you use this service under the hood.

Today, it serves around 50 Tb of data (outbound bandwidth) each month from a single virtual machine on AWS, which costs around $6,000 per month.

The Jenkins infrastructure team has worked relentlessly over the past years to implement a new sustainable implementation for this service in order to sustain and improve it.

The new Update Center implementation features a highly available system that redirects user requests to a download mirror close to their location.
Additional information is available in the link:https://github.com/jenkins-infra/helpdesk/issues/2649[GitHub issue].

== Why this Sixth "link:https://en.wikipedia.org/wiki/Brownout_(electricity)[Brownout]"?

Our functional tests and performance tests are meeting our expectations after the initial work and the five brownouts we ran.

However the previous (fifth) brownout raised 2 issues:

- We miss some directory listings that are present on the current infra and are not available on mirrors

- We've also added log retain from Cloudflare to Datadog that should help if HTTP/404 errors still occur after 8-9 hours or production usage due to network file share (SMB/CIFS) issues with Apache. The service is self-healing but we are trying to fine-tune this: if it fails, then we'll have to update the Apache architecture to stop using a network file share. See https://github.com/jenkins-infra/helpdesk/issues/4312 for details.

As such, a sixth brownout serves to verify the above (minor) problems are solved under a production workload, this should be the last brownout.

We plan a full migration to this new infrastructure for Monday 18 November, more information on this to come.

== How

Both current and new Update Centers are updated at the same time and serve the same index.

Starting 12 weeks ago, the Jenkins infrastructure has been using the new Update Center (link:https://azure.updates.jenkins.io[azure.updates.jenkins.io]) with a client-side DNS override (`updates.jenkins.io` hostname points to this new service).

During this brownout, we'll simply switch the DNS entry `updates.jenkins.io` to this new service and watch for the logs and error rate.
At the end, we'll switch DNS back to the normal service and then analyze metrics and logs to see how the system behaved.

We are confident the new system will perform as expected.

Please refer to the link:https://github.com/jenkins-infra/helpdesk/issues/2649[helpdesk ticket] for more information.