diff --git a/content/blog/2024/11/07/2024-11-07-update-center-brownouts-6.adoc b/content/blog/2024/11/07/2024-11-07-update-center-brownouts-6.adoc new file mode 100644 index 000000000000..78620116f5cc --- /dev/null +++ b/content/blog/2024/11/07/2024-11-07-update-center-brownouts-6.adoc @@ -0,0 +1,68 @@ +--- +layout: post +title: "Brownout on Update Center (updates.jenkins.io):\n 07 and 08 November 2024" +tags: +- jenkins +- jenkins-infra +- update-center +authors: +- smerle33 +opengraph: + image: /images/post-images/2023/01/12/jenkins-newsletter/infrastructure.png +overview: "We'll be using the new Update Center implementation in production for 1 day" +discourse: true +--- + +== Summary (link:https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read[TL;DR]) + +Note: this is a follow up of link:/blog/2024/10/24/update-center-brownouts-5/[the 24 and 25 October 2024 24 hours brownout]. + +The service link:https://updates.jenkins.io will switch its implementation to a new system during 1 day: + +- From Thursday 7 November 2024 09:00 am UTC until Friday 8 November 2024 09:00 am UTC + +All Jenkins users are impacted but should not see any functional change. + +⚠️ Please check that your organization respects the advertised DNS TTL or you might be stuck in the brownout longer than expected. + +Under the hood, any HTTP request made to this service will be redirected to a mirror close to the users' location during these brownouts. + +== What is the "Update Center"? + +Jenkins link:https://updates.jenkins.io[Update Center] is a web server at the core of the Jenkins public infrastructure that distributes the plugins, tool installers, and versions index to all Jenkins servers across the world. + +From the installation wizard to regular plugin updates, if you run Jenkins, then you use this service under the hood. + +Today, it serves around 50 Tb of data (outbound bandwidth) each month from a single virtual machine on AWS, which costs around $6,000 per month. + +The Jenkins infrastructure team has worked relentlessly over the past years to implement a new sustainable implementation for this service in order to sustain and improve it. + +The new Update Center implementation features a highly available system that redirects user requests to a download mirror close to their location. +Additional information is available in the link:https://github.com/jenkins-infra/helpdesk/issues/2649[GitHub issue]. + +== Why this Sixth "link:https://en.wikipedia.org/wiki/Brownout_(electricity)[Brownout]"? + +Our functional tests and performance tests are meeting our expectations after the initial work and the five brownouts we ran. + +However the previous (fifth) brownout raised 2 issues: + +- We miss some directory listings that are present on the current infra and are not available on mirrors + +- We've also added log retain from Cloudflare to Datadog that should help if HTTP/404 errors still occur after 8-9 hours or production usage due to network file share (SMB/CIFS) issues with Apache. The service is self-healing but we are trying to fine-tune this: if it fails, then we'll have to update the Apache architecture to stop using a network file share. See https://github.com/jenkins-infra/helpdesk/issues/4312 for details. + +As such, a sixth brownout serves to verify the above (minor) problems are solved under a production workload, this should be the last brownout. + +We plan a full migration to this new infrastructure for Monday 18 November, more information on this to come. + +== How + +Both current and new Update Centers are updated at the same time and serve the same index. + +Starting 12 weeks ago, the Jenkins infrastructure has been using the new Update Center (link:https://azure.updates.jenkins.io[azure.updates.jenkins.io]) with a client-side DNS override (`updates.jenkins.io` hostname points to this new service). + +During this brownout, we'll simply switch the DNS entry `updates.jenkins.io` to this new service and watch for the logs and error rate. +At the end, we'll switch DNS back to the normal service and then analyze metrics and logs to see how the system behaved. + +We are confident the new system will perform as expected. + +Please refer to the link:https://github.com/jenkins-infra/helpdesk/issues/2649[helpdesk ticket] for more information.