Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Retrospective for March/April 2024 Releases #28

Closed
6 of 9 tasks
adamfarley opened this issue Jan 30, 2024 · 55 comments · Fixed by adoptium/temurin-build#3798
Closed
6 of 9 tasks

General Retrospective for March/April 2024 Releases #28

adamfarley opened this issue Jan 30, 2024 · 55 comments · Fixed by adoptium/temurin-build#3798
Assignees

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Jan 30, 2024

Summary

A retrospective for all efforts surrounding the titular releases.

All community members are welcome to contribute to the agenda via comments below.

This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.

On the day of the meeting we'll review the agenda and add a list of actions at the end.

Invited: Everyone.

Time, Date, and URL

Time: 3pm BST, 10am EST.
Date: Tuesday the 7th of May, 2024.
URL: https://eclipse.zoom.us/j/82423919203?pwd=jcs9cimNWYIflSqChjnT5U5Aj62sSx.1
Meeting ID: 824 2391 9203
Passcode: 339984

Details

Retrospective Owner Tasks (in order):

  • Post retro URL in #Release around the start of the new release.
  • Wait until most builds are released, with no signs of a respin.
  • Announce the retrospective's date + time on #Release a week in advance.
  • Tell Carmen Delgado so they can add this retrospective to the community calendar.
  • Host the retrospective:
    • Go through the agenda.
    • Create a list of actions.
  • Process each action:
    • Create a "WIP" issue including the source comment.
    • Add the issue to the current iteration.
    • Add an issue link to the action list.
  • Create a new retrospective issue for the next release.
  • Set a calendar reminder so you remember to do step 1 before the next release.
  • Close this issue.

TLDR

Add proposed agenda items as comments below.

@adamfarley adamfarley self-assigned this Jan 30, 2024
@smlambert
Copy link
Contributor

Scary msg when publishing aarch64_mac binaries, believe it did the right thing (pushing 31 artifacts to releases repo, but found 62 and reported UNSTABLE
https://ci.adoptium.net/job/build-scripts/job/release/job/refactor_openjdk_release_tool/8424/

dryrun did not indicate any issues to be aware of https://ci.adoptium.net/job/build-scripts/job/release/job/refactor_openjdk_release_tool/8423/

@sophia-guo
Copy link
Contributor

sophia-guo commented Mar 20, 2024

#28 (comment)

It happened to all platforms.

The status is unstable as it's counting some of the -ea tagged artifacts, which means the check file needs to update. Maybe we should think is there a better way or automate way to check the numbers?

@jerboaa
Copy link

jerboaa commented Mar 20, 2024

Something went wrong with publishing source tarballs for the Jan 2024 update. See: adoptium/adoptium-support#1003 we should make sure that it's there with some verification for any release.

@sophia-guo
Copy link
Contributor

Also to publish the binary you can try the rerun link at the lower part of page https://ci.adoptium.net/job/build-scripts/job/release-openjdk22-pipeline/5/, which is enabled for this release.

Dry runs are triggered by pipeline job itself. If dry run fails the rerun link would be Dry run RELEASE Publish temurin jdk-22+36 mac x64, which will trigger a dryrun. Otherwise the rerun link would be RELEASE Publish temurin jdk-22+36 mac x64 and no need to do the dryrun.
Screenshot 2024-03-20 at 3 44 29 PM

@smlambert
Copy link
Contributor

smlambert commented Mar 20, 2024

Also to publish the binary you can try the rerun link

That is what I clicked to run first the dry run (8423), then the release run (8424). I expected the release checks that we have in place to work, but I guess they do not take into account the presence of the EA artifacts.

By the way, I very much LOVE having the release links available, as I will never get the regex wrong again! Now we just need to update the checks to be a bit more specific and handle gracefully or remove the EA artifacts.

@smlambert
Copy link
Contributor

With new feature release, need to ensure aqa-tests JCK configs updated (and likely shift to using a template that does not require duplicating configs for each new version).

@smlambert
Copy link
Contributor

With new feature release, check if Version List on website needs updating? adoptium/adoptium.net#2731

@sophia-guo
Copy link
Contributor

That is what I clicked to run first the dry run (8423), then the release run (8424).

A little bit of confused. Dry run was triggered by pipeline and succeeded. Is 8423 for double check?

@smlambert
Copy link
Contributor

smlambert commented Mar 20, 2024

A little bit of confused. Dry run was triggered by pipeline and succeeded. Is 8423 for double check?

Yes, and also because I did not find the output from the original dry run quickly

@sophia-guo
Copy link
Contributor

Would it be helpful to add the dry-run build links?

@smlambert
Copy link
Contributor

Downside of everyone and their dog using Grinders for all kinds of good work is that it is more difficult to spot the ones launched to complete release triage. (no action required on this comment, good bookkeeping can cover it, or we advise folks doing dev work to use Grinder_Dev, or some such).

@smlambert
Copy link
Contributor

Would it be helpful to add the dry-run build links?

I think the dryruns for publishing were there to help deal with the great potential for human error, which is essentially removed by the addition of the prepopulated 'quick links' found at the bottom of the parent pipeline job.

If the dryruns did some of the verification that happens during the actual publish (checking the right number of artifacts exist), and subsequently fail or report if there was an issue, they would have a purpose. Without the verification checks happening in the dry run, there seems little need to run an automated dry run.

@smlambert
Copy link
Contributor

The release has been chugging along smoothly enough to allow for development work to continue alongside of it. The unfreezing of master branches in Temurin project also a good thing.

@smlambert
Copy link
Contributor

Triggering the pipeline ahead of the -ga tag was a good call. The upstream -ga tag did not show up until much later (~full day) than we triggered the pipeline (Wed/20th).

@smlambert
Copy link
Contributor

adoptium/aqa-tests#5156 (comment) - for follow-up AQAvit actions

@smlambert
Copy link
Contributor

Release notes not being served up via API (so not showing up on the website). Slack msg, am I missing a step that I do not find instructions for?

@smlambert
Copy link
Contributor

Why is EclipseMirror job on the TC Jenkins server?
Think we can hook that checklist action into a post build job, along with many other tasks. (related: adoptium/ci-jenkins-pipelines#610 (comment))

@smlambert
Copy link
Contributor

If we are clearing out / deleting jobs on Jenkins, let's do it via Jenkins API (which then keeps the JobID history), not by deleting workspace directly on the server (where JobID history not kept and ID count starts again, causing repeated/duplicated IDs). This then leads to a problem with seeing the new runs on TRSS that uses those jobIDs as indices in the DB.

@sxa
Copy link
Member

sxa commented Apr 3, 2024

Why is EclipseMirror job on the TC Jenkins server? Think we can hook that checklist action into a post build job, along with many other tasks. (related: adoptium/ci-jenkins-pipelines#610 (comment))

Because it contains a secret credential that we were only allowed to have on the eclipse-managed server.

@sxa
Copy link
Member

sxa commented Apr 3, 2024

If we are clearing out / deleting jobs on Jenkins, let's do it via Jenkins API (which then keeps the JobID history), not by deleting workspace directly on the server (where JobID history not kept and ID count starts again, causing repeated/duplicated IDs). This then leads to a problem with seeing the new runs on TRSS that uses those jobIDs as indices in the DB.

See comment on adoptium/aqa-test-tools#860 (comment) - I don't believe that deleting the job definition via any means would have met the requirements here and retained those identifiers .

@sxa
Copy link
Member

sxa commented Apr 5, 2024

[Update Made]

Guide to creating new mirror releases should including archiving the non-u release when u mirror is created.

@sxa
Copy link
Member

sxa commented Apr 5, 2024

[Update made]

The steps in https://github.com/adoptium/temurin-build/wiki/Creating-new-jdkNNu-(updates)-repro-mirror-from-the-jdkNN-release-mirror#how for manually doing initial population of the mirror should not be required based on this slack thread so the docs should reflet that (and possibly put the manual steps in a <summary> section. Noting also that after the mirror is created it can take up to three hours for the permissions to start working if you need to do it manually.
Also as per the thread you may see this error the first time you run a new mirror job to populate the repository which will cause the job to fail - it seems to go through ok on a second attempt to run the mirror job:

+ git rebase skara/master master
fatal: no such branch/commit 'master'
Build step 'Execute shell' marked build as failure

@sxa
Copy link
Member

sxa commented Apr 5, 2024

[UPDATE MADE]

Also the doc on creating the generator should explicitly state that while the title of the job should have the u as appropriate, the JAVA_VERSION should NOT have the u as it gets added later and will result in this:
hudson.remoting.ProxyException: java.nio.file.NoSuchFileException: /home/jenkins/workspace/build-scripts/utils/evaluation-pipeline_jobs_generator_jdk22u/pipelines/jobs/configurations/jdk22uu_pipeline_config.groovy

ALSO: Memo to self: release-pipeline-generator kicks off regen of the top level release-openjdkXX-pipeline jobs before initiating each of the versioned ones underneath it (sequentially) so they don't need to be done separately

@sxa
Copy link
Member

sxa commented Apr 5, 2024

[PR link]

Add note to the checklist or RELEASING.md to indicate that we generally use the same AQA branch for March+April and for Sept+Oct *Subject to updating the JDxx_BRANCH name for the "new" March/Sept release)

@sxa
Copy link
Member

sxa commented Apr 7, 2024

Noting that on my initial attempt to set the dryrun tags I made it jdk-17.0.11+7-dryrun-ga (i.e. including the +7 build identifier. This will cause problems and such a tag will need to be deleted prior to running the pipelines or you'll get something like this from openjdk_pipeline.groovy:

[INFO] Resolved jdk-17.0.11-dryrun-ga to upstream build tag jdk-17.0.11+6jdk-17.0.11+6-dryrun-ga
[Pipeline] echo
[ERROR] scmReference does not match with any JDK branch in testenv.properties in aqa-tests release branch. Please update aqa-tests v1.0.1-release release branch. Set the current build result to FAILURE!

Also noting that if the pipeline does fail after being triggered, the workspace/tracking file on the jenkins worker node will need to be manually updated or you won't be able to re-trigger as the job uses that for its status and will not re-trigger the same underlying tag twice unless it's manually fixed.

EDIT: I'm not sure why but the jdk-17.0.11+6-dryrun-ga seemed to re-appear and caused the same issue when I did a second dry-run. Same happened for other releases I'd done it for. EDIT: It was because the jenkins workspace machine still had a cache of the old tag so we were pushing it on every update

@sxa
Copy link
Member

sxa commented Apr 8, 2024

The new u release (jdk22u) does not have any tags which can be used as an equivalent of the -dryrun-ga release, therefore we need to put in a fix to allow the dryrun process to run on jdk22u (and the same for subsequent STS releases). At the moment I'm going to use 20.0.0+0 because that will never be used, but we should consider what to do for future versions, since it will not be as simple to insert something similar between 20.0.1.x and 20.0.2.y
Ref: adoptium/mirror-scripts#50

EDIT: Noting that we also require a corresponding jdk-20.0.0+0_adopt tag to be created, but NOT on the same commit otherwise you hit the issue from the previous comment. If you have to retag because you made it the same, be sure that the mirror jobs do not have the cached version of the old tag, as it will cause a failure.

EDIT: We got a failure in the create_installer_windows job which said SOURCE Dir not found / failed (longer snippet below). From @andrew-m-leonard "I believe that will be because the tag “jdk-22.0.0+0” does not meet the expected format for jdk-22.0.1, we are building jdk22u HEAD which is 22.0.1 which is what the version string will be and is what the installer build expects…

looking for .\SourceDir\OpenJDK-Latest\hotspot\x64\jdk-22.0.1+null
SOURCE Dir not found / failed
Listing directory :
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK22
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK22\hotspot
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK22\hotspot\x64
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK22\hotspot\x64\jdk-22.0.0+0
F:\workspace\workspace\build-scripts\release\create_installer_windows\wix\SourceDir\OpenJDK22\hotspot\x64\jdk-22.0.0+0\bin

@sxa
Copy link
Member

sxa commented Apr 8, 2024

It seems that the code behind the *-openjdk22-pipeilne groovy scripts prefers to pick up a non-u configuration. I added 22u in the configurations dir and hadn't removed 22, so the dryruns triggered the jdk22- jobs using the jdk22 repository instead of the right one.
Fixed by removing the jdk22 files from the configurations dir in this PR but we should consider whether prefering the non-u version is correct (It's hard to envision a scenario where it would be IMHO) but we should definitely cover this in the wiki page to ensure that the new u releases are done with a rename instead of creating new ones alongside the non-u versions.

@sxa
Copy link
Member

sxa commented Apr 8, 2024

[PR link - code freeze won't work with +NN so .NN is the correct one to use]

Should we use vYYYY.MM.NN or vYYYY.MM+NN for the build branches? The Releasing guide is ambiguous (which resulted in me I ended up with both at one point):
image

@sxa
Copy link
Member

sxa commented Apr 8, 2024

Mirror scripts, if left to their own devices to populate a new u repository, do not include the README.JAVASE marker (and potentially not any other patches) in the dev/release branches.

EDIT: Fixed by adoptium/mirror-scripts#51

@smlambert
Copy link
Contributor

smlambert commented Apr 21, 2024

Is there a naming convention that one should follow for build branches for release?

Screenshot 2024-04-21 at 5 52 43 PM

Also, are we freezing the release branches or still freezing master?

@sxa
Copy link
Member

sxa commented Apr 25, 2024

@smlambert Naming is covered in a previous comment for discussion as the doc is currently ambiguous

@sxa
Copy link
Member

sxa commented Apr 25, 2024

Notes on blog post production:

  • Should we be creating an issue or PR for the "next" release? (Checklist says PR, but we created an issue for this one)
  • We should have a template for the blog post (maybe a script to set all the version numbers in the right place?) which includes information on how to generate the information for the vulnerability section (automatable?) and templates for how to display CA Certs updates too

EDIT: Slack message from Shelley indicates that it's a manual lift from the appropriate page under the upstream advisories page for now so that should be used for the guides

@sxa
Copy link
Member

sxa commented Apr 25, 2024

Ensure we define processes for the installers for:

@smlambert
Copy link
Contributor

Should we be creating an issue or PR for the "next" release? (Checklist says PR, but we created an issue for this one)

I originally put PR, but think we should change it to issue, as it does what we need it to do (be a placeholder for 'new and noteworthy' notes between release period, and also does not tie the originator of the PR into being the next blog post author (pen named PMC).

@sxa
Copy link
Member

sxa commented Apr 25, 2024

I originally put PR, but think we should change it to issue

Noting that I'm planning to avoid creating either just now until we do the retrospective and have the discussion on this (I'm writing this to remind me that I need to do it when we're going through these comments ;-) )

@sxa
Copy link
Member

sxa commented Apr 25, 2024

"Full update" on the website had to be forced to pick up the new release notes (slack thread). Do we know why? Anything we can fix / document for future understanding.

@sxa
Copy link
Member

sxa commented Apr 25, 2024

Release notes seem to repeatedly have problems staging. We should attempt to understand and resolve why.

@sxa
Copy link
Member

sxa commented Apr 30, 2024

download verification test gets confused if there aren't any .zip files in the release e.g. 22.0.1.1 for s390x: https://ci.adoptium.net/job/build-scripts/job/release/job/download_and_sbom_validation/27

[EDIT: Fixed in https://github.com/adoptium/temurin-build/pull/3798]

@sxa
Copy link
Member

sxa commented May 1, 2024

Noting that choosing to run the initial pipelines without win32 meant that the 32-bit build+AQA pipelines completed in a maximum of 25 hours. This contrasts with the initial tests with both variants with all five releases which were taking up to 1d16h to complete. On this basis I would propose that we kick off the 32-bit pipelines approximately 24 hours after the win64 ones where practical and to avoid contention, although if all 64-bit pipelines are already in progress, then a short delay of 2 hours may be adequate (Bear in mind the goal is to avoid the 32-bit tests running first - the builds take between 26m (JDK8) and 95 minutes (JDK22) so a 2 hours buffer after the last 64-bit build will generally be adequate, although the 24 hour proposal means there there is less chance of machine contention for any re-runs for 64-bit re-runs that are required.

@sxa
Copy link
Member

sxa commented May 1, 2024

New feature: Improve download test capability so that it works with ea builds so we can pre-empt check issues during the release: adoptium/temurin-build#3784

Potentially look at adding this to the main pipelines (Do we have a separate issue for that?)

@jiekang
Copy link

jiekang commented May 1, 2024

To discuss: adoptium/ci-jenkins-pipelines#994

Priority: needs to be done before July release

@adamfarley
Copy link
Contributor Author

adamfarley commented May 7, 2024

Actions

#28 (comment)
Add to download verification process - raise issue
adoptium/temurin-build#3801

#28 (comment)
Raise issue for sxa to update release checklist
#42

#28 (comment)
Raise issue to review queue times in the next release.
adoptium/aqa-tests#5294

#28 (comment)
Create issue to discuss dryrun-launching job. Should we have one? Discuss.
adoptium/ci-jenkins-pipelines#1028

#28 (comment)
Discuss in cc.

#28 (comment)

#28 (comment)
Needs moving to same section as smoke tests (so it gets executed before the main test).
Issue raised: adoptium/temurin-build#3875

#28 (comment)
Create issue.
Raised: adoptium/ci-jenkins-pipelines#1070

#28 (comment)
Create infra issue to create machine.
Issue raised: adoptium/infrastructure#3652

#28 (comment)
Raise a tck issue to cover that side of this. Remove burstable nodes altogether (and remove burst node tag requirement?).
Raised here: https://github.com/temurin-compliance/temurin-compliance/issues/526
Stewart will take an action to change release doc - 48hr delay on win32 builds during releases.

Notes (any actions already handled as indicated)

#28 (comment)
Shelley has offered to raise an issue.

#28 (comment)
Shelley to raise an aqa tests tools issue for this.

#28 (comment)
Stewart to do doc change.

#28 (comment)
Stewart to do this. No additional issue needed.

#28 (comment)
Stewart to do this, no issue needed. Andrew to raise PR.

#28 (comment)
Stewart to do, no issue needed.

#28 (comment)
Issue already open, no issue needed.

#28 (comment)
Yes. Solved as per Stewart's comment.

#28 (comment)
Shelley will raise issue

#28 (comment)
Already an issue for this. adoptium/installer#829

#28 (comment)
Stewart will make a note.
Jie will make a discussion issue.

#28 (comment)
Will be discussed separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants