Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[validation] Invalid externalReferences URLs may be present in the SBOM #1107

Closed
marob opened this issue May 23, 2024 · 38 comments
Closed

[validation] Invalid externalReferences URLs may be present in the SBOM #1107

marob opened this issue May 23, 2024 · 38 comments
Assignees
Labels
enhancement New feature or request needs contributor

Comments

@marob
Copy link
Contributor

marob commented May 23, 2024

With DependencyTrack 4.11 validating SBOM with schema validation at upload, I've discovered that some SBOM may have invalid externalReferences.
For example:

"externalReferences": [
  {
    "type": "vcs",
    "url": "[email protected]:behat-chrome/chrome-mink-driver.git"
  }
],

coming from composer dmore/chrome-mink-driver package.

Indeed, an externalReference should be a iri-reference or a #/definitions/bomLink according to the JsonSchema.

The problem is that [email protected]:behat-chrome/chrome-mink-driver.git is neither a #/definitions/bomLink nor an iri-reference (that should be of the form scheme://... according to the RFC).

If we can "convert" [email protected]:behat-chrome/chrome-mink-driver.git to https://gitlab.com/behat-chrome/chrome-mink-driver.git it would be nice, but if not, we'd better drop the reference than write an invalid one.

@fabian-zeindl-oebb
Copy link

This is not the only issues.
I have seen BOMs that contain whitespace around the URL inside the reference, like

"externalReferences": [
  {
    "type": "vcs",
    "url": "     [email protected]:behat-chrome/chrome-mink-driver.git      "
  }
]

Or failed replacements like

"externalReferences": [
  {
    "type": "vcs",
    "url": "${repository.url}"
  }
]

@prabhu prabhu added enhancement New feature or request needs contributor labels May 28, 2024
@prabhu
Copy link
Collaborator

prabhu commented May 28, 2024

@marob, I am in favor of dropping the invalid external references and trimming the value. My personal opinion is that CycloneDX spec should be unopinionated and should not have such validations for external references.

Pull requests are welcome. If not, we can collect all such tickets and raise funds to hire an engineer.

@prabhu prabhu changed the title Invalid externalReferences URLs may be present in the SBOM [validation] Invalid externalReferences URLs may be present in the SBOM May 28, 2024
@prabhu
Copy link
Collaborator

prabhu commented May 28, 2024

Add some validations and trimming here, here, and here.

Enable strict validation here

@timmyteo timmyteo self-assigned this May 29, 2024
@setchy
Copy link
Member

setchy commented May 29, 2024

I think we may need to trim the url, and then ideally use the same regex DTrack is using to validate [IRI RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987

We've had instances like the following being rejected

{
            "publisher": "UNKNOWN",
            "group": "",
            "name": "yum-metadata-parser",
            "version": "1.1.4",
            "description": "A fast YUM meta-data parser",
            "purl": "pkg:pypi/[email protected]",
            "externalReferences": [
                {
                    "type": "website",
                    "url": "UNKNOWN"
                }
            ],
   }
   
   {
   "author": "Ruben Verborgh <[email protected]> (https://ruben.verborgh.org/)",
            "group": "",
            "name": "follow-redirects",
            "version": "1.15.6",
            "description": "HTTP and HTTPS modules that follow redirects.",
            "licenses": [
                {
                    "license": {
                        "id": "MIT",
                        "url": "https://opensource.org/licenses/MIT"
                    }
                }
            ],
            "purl": "pkg:npm/[email protected]",
            "externalReferences": [
                {
                    "type": "vcs",
                    "url": "https://github.com/follow-redirects/follow-redirects"
                },
                {
                    "type": "vcs",
                    "url": "[email protected]:follow-redirects/follow-redirects.git"
                }
            ],
}

@setchy
Copy link
Member

setchy commented May 29, 2024

Started a discussion DependencyTrack/dependency-track#3775 to confirm.

@prabhu
Copy link
Collaborator

prabhu commented May 29, 2024

@setchy and all, @timmyteo has kindly volunteered to look into this. Please work with him to get this tested and released.

@setchy
Copy link
Member

setchy commented May 30, 2024

DependencyTrack/dependency-track#3775 (reply in thread)

Use https://www.npmjs.com/package/hosted-git-info to get the git urls fixed.
This worked for other JavaScript-based CycloneDX generators.

@timmyteo
Copy link
Collaborator

@setchy In your comment here, you mention that you saw the following in a generated BOM from cdxgen? [email protected]:follow-redirects/follow-redirects.git

I ran cdxgen against the codebase for follow-redirects but did not see that value in the output BOM. Can you please let me know what you were running cdxgen against so I can use the same in testing?

@marob
Copy link
Contributor Author

marob commented May 30, 2024

I've started to work on it by enabling strict JSON Schema validation on this PR: #1128

@prabhu
Copy link
Collaborator

prabhu commented Jun 5, 2024

Please test with the latest master branch

@Kanti
Copy link

Kanti commented Jun 6, 2024

@prabhu, thanks a lot for the fix. It works perfectly.

@prabhu
Copy link
Collaborator

prabhu commented Jun 6, 2024

@Kanti could you help test the master branch a bit more and let me know how it performs? If you have java applications, please try the new dependency tree mode using the environment variable PREFER_MAVEN_DEPS_TREE=true

@Kanti
Copy link

Kanti commented Jun 6, 2024

@prabhu sorry no java over here.

@malice00
Copy link
Contributor

malice00 commented Jun 6, 2024

I had this issue a well and had it in Java -- unfortunately not Maven but Gradle. Seems to work perfectly though!

@prabhu
Copy link
Collaborator

prabhu commented Jun 6, 2024

Thanks for confirming everyone!

@prabhu prabhu closed this as completed Jun 6, 2024
@marob
Copy link
Contributor Author

marob commented Jun 7, 2024

@prabhu It looks better, but I still have at least one project with an error when uploading to DependencyTrack because of python bb package.
Indeed the PKG-INFO file in bb python dependency has Home-page: http:// that translates in the BOM to:

      "purl": "pkg:pypi/[email protected]",
      "externalReferences": [
        {
          "type": "website",
          "url": "http://"
        } 
      ],

That seems to be valid according to validate-iri but KO for DependencyTrack.
I don't know which one is right though 🤷‍♂️.

@prabhu
Copy link
Collaborator

prabhu commented Jun 8, 2024

@marob, interesting find. http:// without a domain name is definitely not a useful external reference. Perhaps in addition validate-iri, we parse with node url and check if there is at least a domain present?

@marob
Copy link
Contributor Author

marob commented Jun 20, 2024

I've created an issue on json-schema-validator that is used by DependencyTrack.
In my opinion, "http://" should be considered as valid as it is allowed by the RFC (if I understand it well), event though it's not a very useful value.
So the fix should be on DependencyTrack/json-schema-validator side and not on cdxgen/validate-iri.

@marob
Copy link
Contributor Author

marob commented Jun 24, 2024

It's fixed in [email protected] that is used in [email protected] that is used in [email protected]

@marob
Copy link
Contributor Author

marob commented Jun 24, 2024

I'm re-closing this issue

@marob marob closed this as completed Jun 24, 2024
@phoenix-aditya
Copy link

Hey getting the following error:

IRI failed validation [email protected]:amznlabs/ion-java.git

due to which cdxgen is getting stuck and failing, any insights on the following?

@marob
Copy link
Contributor Author

marob commented Jul 18, 2024

Hi @phoenix-aditya. This log comes from this change.
It logs the "error" (it's more a warning) and then excludes the corresponding reference (that contains the invalid IRI) from the generated SBOM, to prevent it to be invalid (according to the CycloneDX format).

It shouldn't make cdxgen get stuck. If cdxgen is stuck, I think the source of the problem is elsewhere.

@phoenix-aditya
Copy link

Hey mainly what is happening is when the PREFER_MAVEN_DEPS_TREE is set to true, due to some reason its taking much longer for cdxgen to run
is there a way to disable iri validation?

@marob
Copy link
Contributor Author

marob commented Jul 18, 2024

According to https://github.com/comunica/validate-iri.js?tab=readme-ov-file#performance, validating 1.8M values takes 12s. Are you sure the performance issue comes from IRI validation?

@phoenix-aditya
Copy link

Hey,

i enabled CDXGEN_DEBUG_MODE even then the last command i see is only
IRI failed validation [email protected]:jnr/jnr-constants.git IRI failed validation [email protected]:jnr/jnr-enxio.git IRI failed validation [email protected]:jnr/jnr-posix.git IRI failed validation [email protected]:javaslang/javaslang.git IRI failed validation [email protected]:javaslang/javaslang.git IRI failed validation [email protected]:bennidi/mbassador.git
so not sure if the validation is taking time or something else is failing, mainly the process gets stuck for a long time and i am experiencing this on multiple projects

Note: I am only facing this issue post setting PREFER_MAVEN_DEPS_TREE as true i am not sure if the two are corelated so wanted to run cdxgen without IRI validation

@phoenix-aditya
Copy link

I am not sure why its failing but i feel i can test out if its IRI validation that is causing the problem in my environment if there is a way to disable it

@marob
Copy link
Contributor Author

marob commented Jul 18, 2024

There's no relation between PREFER_MAVEN_DEPS_TREE and IRI validation.

IRI validation cannot be disabled.
Before introducing validation, cdxgen was copying invalid IRI in the SBOM, which resulted in invalid SBOMs. Those SBOMs were not consumable by any tool that would validate the SBOM before ingesting them.
Now, IRI validation detects invalid IRIs, logs them and removes them from resulting SBOM, making them (hopefully) valid (at least concerning IRIs). Impact on performance should not be perceptible.

@prabhu
Copy link
Collaborator

prabhu commented Jul 18, 2024

@phoenix-aditya sounds like a separate bug, possibly an infinite loop. Can you run the maven command shown on the output when invoked with the PREFER_MAVEN_DEPS_TREE environment variable directly? Maybe there is a cyclic dependency or some such there.

Best to share your observation as a separate discussion thread.

@guyscher2
Copy link
Contributor

Hey,
From version =>10.6.0 I get this:
image

any idea how to resolve?

@marob
Copy link
Contributor Author

marob commented Jul 21, 2024

cdxgen generates SBOMs by analyzing files.
During that analysis, it extracts information from files such as package.json, pom.xml, composer.json, ...
In those information, there is an external reference URLs corresponding to the URL of the corresponding dependency.
CycloneDX format (the specification for SBOM) forces those external references to be an "iri-reference" (to simplify, an URL).
The problem is that the external reference extracted from package.json, pom.xml, composer.json, ... is sometimes NOT an iri-reference (such as git@...).
Before this PR, the generated SBOM was sometimes invalid according to the CycloneDX specification. Now, it logs a warning message for every external reference URL that is not a valid iri-reference and excludes that external reference from the generated SBOM to prevent it to be invalid.

So, I see 3 "solutions":

  1. fix the reference in the corresponding package: here, change [email protected]:micrometer-metrics/micrometer.git to something like https://github.com/micrometer-metrics/micrometer, but it requires some contribution to the micrometer-metrics/micrometer source code
  2. add a feature in cdxgen to "convert" invalid reference to a valid one: it's "easy" for [email protected]:micrometer-metrics/micrometer.git, but how would you convert ${repository.url} or any other incorrect value?
  3. ignore those warnings

@guyscher2
Copy link
Contributor

guyscher2 commented Jul 21, 2024

@marob thanks for the detailed explanation, it is really helpful!

I have no problem ignoring this warning as it does not cause an invalid sbom file, at least for me.

The issue is that after it prints the warning the cli gets stuck.

@phoenix-aditya
Copy link

Hey this seems like the exact same issue i am facing,
Unfortunately i wont be able to share the project i am running it on due to compliance reasons but i have seen this happen on multiple projects in my case with the newer version

@prabhu
Copy link
Collaborator

prabhu commented Jul 21, 2024

Could you guys share the command line arguments used? Also share the full output after setting the CDXGEN_DEBUG_MODE=debug environment variable. @guyscher2 do you also use the new PREFER_MAVEN_DEPS_TREE feature?

@prabhu
Copy link
Collaborator

prabhu commented Jul 21, 2024

@phoenix-aditya One difference between the cyclonedx maven plugin and the new PREFER_MAVEN_DEPS_TREE is the automatic use of settings.xml file. If there are any settings.xml file with an invalid or non-existent proxy, the maven command can be expected to freeze and eventually fail. Maybe setting the environment variable CDXGEN_TIMEOUT_MS to a smaller value to make it fail sooner?

@guyscher2
Copy link
Contributor

@prabhu
I run:
cdxgen -o sbom.json --exclude "/.git/**"

Following env vars:
export FETCH_LICENSE=true export CDXGEN_DEBUG_MODE=debug

plus tried PREFER_MAVEN_DEPS_TREE=true/false

logs:
image

gets stuck on IRI failed validation message.
I wish it will be a warning and continue.

@prabhu
Copy link
Collaborator

prabhu commented Jul 23, 2024

@guyscher2 can you try with the latest version?

@guyscher2
Copy link
Contributor

Look good, thanks! @prabhu

@phoenix-aditya
Copy link

Hey in my case as well it is no longer getting stuck
Thanks @prabhu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs contributor
Projects
None yet
Development

No branches or pull requests

9 participants