Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you control the internal format used by Syft when scanning a directory? #1938

Closed
tomasr opened this issue Jun 12, 2024 · 10 comments
Closed
Labels
enhancement New feature or request

Comments

@tomasr
Copy link

tomasr commented Jun 12, 2024

What would you like to be added:

This is probably a weird question, but when running grype dir:<somedir>, I understand grype is essentially running syft under the hood to produce the source SBOM (or similar). Can you control what format is used for this intermediate representation?

Why is this needed:

My reason for asking is this: I have some dependencies in a folder.

If I run:

grype dir:.

I get:

 ✔ Vulnerability DB                [no update available]
 ✔ Indexed file system                                                                                                                                               .
 ✔ Cataloged contents                                                                                 cdb4ee2aea69cc6a83331bbe96dc2caa9a299d21329efb0336fc02a82e1839a8
   ├── ✔ Packages                        [18 packages]
   └── ✔ Executables                     [0 executables]
 ✔ Scanned for vulnerabilities     [0 vulnerability matches]
   ├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
   └── by status:   0 fixed, 0 not-fixed, 0 ignored

If I first generate an SBOM using syft in cyclonedx-json format, then ingest it with grype sbom:.\sbom.json I get the exact same result.

However, if I first generate an SBOM using syft in SPDX format, then ingest it with grype I get:

 ✔ Vulnerability DB                [no update available]
 ✔ Scanned for vulnerabilities     [1 vulnerability matches]
   ├── by severity: 1 critical, 0 high, 0 medium, 0 low, 0 negligible
   └── by status:   0 fixed, 1 not-fixed, 0 ignored

NAME  INSTALLED  FIXED-IN  TYPE            VULNERABILITY   SEVERITY
zlib  1.2.13               UnknownPackage  CVE-2023-45853  Critical

So obviously the source SBOM format (or whatever the internal syft is producing over it) is somehow relevant to getting usable results?

Additional context:

@tomasr tomasr added the enhancement New feature or request label Jun 12, 2024
@kzantow
Copy link
Contributor

kzantow commented Jun 12, 2024

Hi @tomasr -- Grype is always going to use the internal Syft representation. If you ingest an SBOM, this gets converted to that representation anyway.

What versions of Syft and Grype are you using?

@tomasr
Copy link
Author

tomasr commented Jun 12, 2024

Grype 0.78.0
Syft 1.6.0

@tomasr
Copy link
Author

tomasr commented Jun 13, 2024

Might help if I offer a repro. Here's an easy one:

  • Download this package from nuget.
  • Rename the .nuget file to .zip and expand it on a folder, let's say c:\temp\librdkafka
  • Now run grype dir:C:\temp\librdkafka\runtimes

Output here looks like this:

image

Now first run syft to generate an SPDX SBOM of the exact same files, and scan it with grype:

image

Output is clearly different.

@tomasr
Copy link
Author

tomasr commented Aug 13, 2024

This continues being an issue in the latest release.
Also, the scan seems to even miss known vulnerabilities for other libraries in the same file (like CVE-2023-5363 for libssl/libcrypto) even going through syft/spdx.

I suspect the issue is the handling here for Windows binaries isn't just quite right?

@kzantow
Copy link
Contributor

kzantow commented Aug 13, 2024

Hi @tomasr, sorry for the delay here. I can tell you at least a part of what's happening: when you run grype directly, or if you use Syft JSON format, the packages have the dotnet type, you can see this:

% syft <dir>
NAME                          VERSION        TYPE                    
Microsoft® C Runtime Library  14.29.30040.0  dotnet  (+3 duplicates)  
The OpenSSL Toolkit           3.0.8          dotnet  (+3 duplicates)  
The curl library              8.4.0-DEV      dotnet  (+1 duplicate)   
Zstandard                     1.5.5          dotnet  (+1 duplicate)   
zlib                          1.3            dotnet  (+1 duplicate)

However, by default dotnet does not match using CPEs, but only using the GitHub Security Advisory Database. You can verify your configuration using the grype config command:

% grype config
...
match:
  ...
  dotnet:
    # use CPE matching to find vulnerabilities (env: GRYPE_MATCH_DOTNET_USING_CPES)
    using-cpes: false   

If you enable CPE matching, you will see a grype scan of the directory gives results:

% GRYPE_MATCH_DOTNET_USING_CPES=true grype <dir>

NAME  INSTALLED  FIXED-IN  TYPE    VULNERABILITY   SEVERITY 
zlib  1.3                  dotnet  CVE-2023-45853  Critical

When you output spdx-json, this type information is being lost (note the UnknownPackage type), and grype defaults to CPE matching enabled, which is why you see results when you try this.

The GHSA-xw78-pcr6-wrg8 link you provided also may show why the github matching does not happen: it is in an unreviewed state, and there is no package information. So this entry would not be present in the Grype database.

Whether Syft is generating the correct CPEs and PURLs could still be in question, this is what I see for zlib:

      "cpes": [
        {
          "cpe": "cpe:2.3:a:zlib:zlib:1.3:*:*:*:*:*:*:*",
          "source": "syft-generated"
        }
      ],
      "purl": "pkg:nuget/[email protected]",

... so Grype would, by default, need an entry in GHSA matching the pkg:nuget/zlib identifier. Should someone try to get the aforementioned, unreviewed entry updated in Github?

@tomasr
Copy link
Author

tomasr commented Aug 13, 2024

Thanks, @kzantow that's actually extremely useful. Didn't know about GRYPE_MATCH_DOTNET_USING_CPES, so that's useful.

It also helped me figured out why the OpenSSL CVEs didn't get reported.... syft just generates the wrong CPEs for the libraries.

@kzantow
Copy link
Contributor

kzantow commented Aug 13, 2024

Another note: extracting accurate information from the dotnet portable executable format is pretty challenging -- often the binaries don't contain the exact information that would match a CPE vendor or product, and the usage is inconsistent between different vendors, and even among products and/or releases over time. We're all ears for any ideas how to make the identification better across the board, if there's a way without using an external data source to look things up. (Here are some of the fields being used, if you happened to be interested in improving this)

@willmurphyscode
Copy link
Contributor

Another note @tomasr if you want to have fine control over what Syft is doing during a Grype run, you can do:

syft -c my-syft-config.yaml | grype

And grype will read the syft SBOM from stdin.

There is also some work to make Grype respect Syft configs but that is in very early days.

@tomasr
Copy link
Author

tomasr commented Oct 9, 2024

Another note: extracting accurate information from the dotnet portable executable format is pretty challenging -- often the binaries don't contain the exact information that would match a CPE vendor or product, and the usage is inconsistent between different vendors, and even among products and/or releases over time. We're all ears for any ideas how to make the identification better across the board, if there's a way without using an external data source to look things up. (Here are some of the fields being used, if you happened to be interested in improving this)

Thanks, agree this is a challenging item in Windows in general. I don't really have a great solution, other than probably building a list of matching regexp's or library names to known CPEs. You could also probably do it based on the reported exports for known libraries, I guess, but neither one is really a great option.

@willmurphyscode Thanks for the tip. That's sort of what I am doing for now; just running syft to produce an SBOM, then manually replacing known values in it (since I know what third party stuff we have in our repo) and then feeding that into grype. Not great, but at least I can make it work.

@willmurphyscode
Copy link
Contributor

Hi @tomasr! I think it makes more sense to open particular issues for dotnet package cataloging or matching errors. To summarize:

  1. The reason that writing an SPDX SBOM to disk and then having Grype scan it produces different results is that this sometimes erases package type or distro information, which causes grype to match on CPEs, which are broad and prone to false positives, which is why matching against them is off by default for most package types.
  2. Many dotnet packages have inconsistently formatted metadata, which makes matching them challenging.

I see you've already open an issue for a particular case of item 2. I think we can close this issue and track that work there.

@willmurphyscode willmurphyscode closed this as not planned Won't fix, can't repro, duplicate, stale Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

3 participants