Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build.getDependency is not using versioned artifacts #3593

Closed
sxa opened this issue Dec 20, 2023 · 13 comments
Closed

build.getDependency is not using versioned artifacts #3593

sxa opened this issue Dec 20, 2023 · 13 comments
Assignees
Labels
jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites

Comments

@sxa
Copy link
Member

sxa commented Dec 20, 2023

https://ci.adoptium.net/view/all/job/build.getDependency squashes the version number in the downloaded artifacts, making it impossible to recreate a download from there when running a reproducible build test.
For example, the CycloneDX update in #3558 meant that there was a new artifact with the same name and different SHA which means that you can't easily re-run with an old version of the build scripts and expect the SHA checks to pass (because we'll always pull the latest version, which may have changed) which impacts any ability for a customer to fully run reproducibility test of our last GA with an SBoM, and therefore breaks our great story around reproducibility.

Related: adoptium/ci-jenkins-pipelines#863
Most recent slack thread: https://adoptium.slack.com/archives/C09NW3L2J/p1703013340753489?thread_ts=1703010237.478249&cid=C09NW3L2J

@github-actions github-actions bot added jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites labels Dec 20, 2023
@adamfarley adamfarley self-assigned this Mar 12, 2024
@andrew-m-leonard
Copy link
Contributor

The artifacts maybe need a SHA256.txt, possibly GPG .sig
Maybe filename needs a version in it?

@adamfarley
Copy link
Contributor

adamfarley commented Mar 14, 2024

Tasks:

  • Add code to generate a sha256 file for each jar.
  • Test that.
  • Add code to generate a version file for each jar.
  • Test that.
  • Add code to the build.xml file so we're downloading the sha rather than using duplicate hard-coded values.
  • Test that.
  • Add code to add the version and sha for each jar to the sbom
  • Test that.

@sxa
Copy link
Member Author

sxa commented Mar 14, 2024

Looking at the above list I have a couple of questions ...

  1. Why can't we retain the version number from the download? Deliberately removing it then holding a separate version file seems oddly complex.
  2. How are the SHAs being generated? If we're just creating them after downloading then that doesn't protect us against anything other than the transfer between jenkins and the build machines. Hard coded values in the build scripts seem like a much better idea to me unless I've misunderstood what is being attempted here.

@adamfarley
Copy link
Contributor

1. Why can't we retain the version number from the download? Deliberately removing it then holding a separate version file seems oddly complex.

Because it seemed simpler to me than attempting to parse the file name, especially since the getDependencies script already holds the versions separate.

And I think the version strings need to be separate because other version strings in the SBOM are already separate, e.g.:

    "tools" : [
      {
        "name" : "GLIBC",
        "version" : "2.17"
      },
2. How are the SHAs being generated? If we're just creating them after downloading then that doesn't protect us against anything other than the transfer between jenkins and the build machines. Hard coded values in the build scripts seem like a much better idea to me unless I've misunderstood what is being attempted here.

The SHAs are hard-coded in the getDependencies script, and are used to determine whether the download is intact. These values are hard-coded, for security.

We will be generating signatures as part of getDependencies, to remove the risk of a man-in-the-middle attack after the getDependencies download, when the jars are downloaded from Jenkins during SBOM creation preparation.

We are doing this to ensure security while, at the same time, avoiding having to c+p the SHAs in three different places (getDependencies, download during build, and SHA documentation in the SBOM).

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

@sxa
Copy link
Member Author

sxa commented Mar 14, 2024

Because it seemed simpler to me than attempting to parse the file name, especially since the getDependencies script already holds the versions separate.

I still feel that renaming the file is more likely to cause confusion but I won't block based on it, however I do think we've got plenty of precedent for parsing and obtaining version numbers from locally on the machine (especially with the strace output) and I'd personally feel more comfortable with pulling it out on the live system if we can.

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

I'm not sure that the extra complexity of invoking GPG signing here (which I assume is what the "adoptium sig files" refers to) is preferable to just holding those SHAs in the build scripts as well as in this job. IMHO Ideally a consumer of our scripts should be able to use our processes pulling directly from the upstream resources instead of having to rely on our jenkins CI, and this will make it harder for them to point at the upstream URL if desired as that won't have the signatures that we'd be checking against.

@sxa
Copy link
Member Author

sxa commented Mar 14, 2024

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

@adamfarley
Copy link
Contributor

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

Not that I'm aware of, no. I think that should be a different issue.

@adamfarley
Copy link
Contributor

I'd personally feel more comfortable with pulling it out on the live system if we can.

Either one works for me. Not fussed about adding parsing in. Will add that.

Another option is to have the SHAs in a shared location in the source repo, which has its own risks. Every time the SHAs are copied, we are exposed to a MITM risk, which is at least mitigated if the first time we download the files we generate secure Adoptium sig files.

...which I assume is what the "adoptium sig files" refers to

Yup.

...is preferable to just holding those SHAs in the build scripts as well as in this job.

Either way is fine.

...as that won't have the signatures that we'd be checking against.

Fair. Will centralise the SHAs in the build repo then.

@sxa
Copy link
Member Author

sxa commented Mar 14, 2024

Also do we have a solution for being able to pull an old version when required (for example when doing a reproducible build of an older release which may need an older version of one of the SBoMs to produce comparable output)?

Not that I'm aware of, no. I think that should be a different issue.

OK - feel free to split it out if desired, although that was part of the intended scope of this one as per the example in the description of this issue ;-)

Thanks for taking on the other tweaks.

@adamfarley
Copy link
Contributor

adamfarley commented Mar 15, 2024

Update: Currently testing the set of code changes relating to sbom generation, (documentation updates pending).

We now keep the cyclonedx dependency SHAs and version strings in a single location, making it easy for users to set their own SHAs and versions. These version strings will be included in the sbom automatically.

Users will also be able to download dependencies from their chosen source by modifying "sbom_dependency_default_location" in the cyclonedx-lib/build.xml file.

The getDependencies groovy script will also be improved to allow users to set their own preferred location for dependency storage. - done

Monday (2024-03-18) update: The improvements to the ant build.xml file that fetches the jars has been fixed. I've also removed a typo in the build.sh file section that gathers the version strings and stores them in the sbom. Testing again.

Ok, that passed. Added documentation and an exclusion for the sbom dependency that we generate at runtime, as we don't have a version string for that. Final test run.

@adamfarley
Copy link
Contributor

adamfarley commented Mar 18, 2024

TLDR:

The first step here is to centralise the SBOM dependency SHAs and version numbers in specific files. PR here.

This makes it easy for users to specify new versions and SHAs.

The second step (pending) will be to put the upstream location (with version wildcards) in similar "specific files", and to give both the build.xml and build.getDependencies the ability to use them (in the former case: only when the version file doesn't match the one in adoptium/temurin-build).

User POV: To change a dependency version in my build, I simply need to change the version number in "temurin-build/cyclonedx-lib/dependency_data/versions".

@sxa - What do you think? Will version files be enough, or do I need to add the ability to set the version via a script argument?

Update: Will add a script argument. Step 2 will be actioned after I'm done with https://github.com/temurin-compliance/temurin-compliance/issues/474

@adamfarley
Copy link
Contributor

Note: The fix for the bugged dependency SHAs in the sbom has been separated out into a new PR for the sake of the March 2024 release (expedited review).

Master branch PR: #3713
Release branch PR: #3714

@adamfarley
Copy link
Contributor

As the sbom currently contains a link to the exact version of the temurin-build source code that generated a build, I don't think we need to specify the versions of the sbom dependencies if we're trying to reproduce a build (as the temurin-build repo already has that information).

This can be reopened if anyone thinks of another reason the sbom creation dependencies could need to be specified via command-line argument (as opposed to the source files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants