Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support file ownership when using file source #3345

Open
adammcclenaghan opened this issue Oct 17, 2024 · 0 comments
Open

Support file ownership when using file source #3345

adammcclenaghan opened this issue Oct 17, 2024 · 0 comments
Labels
enhancement New feature or request needs-discussion

Comments

@adammcclenaghan
Copy link
Contributor

What would you like to be added:
Today, some of the catalogers support the concept of 'File Ownership', specifically catalogers which implement type FileOwner interface

For example, if I scan my DPKG directory using a directory source, artifact metadata contains entries on which files are owned by my DPKG installation. Take curl as an example:

syft -o syft-json dir:/var/lib/dpkg | jq '.artifacts[] | select(.name == "curl") | .metadata.files'

[
  {
    "path": "/usr/bin/curl",
    "digest": {
      "algorithm": "md5",
      "value": "fb9a88e8023f2fb2a0f475d1c85d8dcb"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/doc/curl/copyright",
    "digest": {
      "algorithm": "md5",
      "value": "39782ccc3532fee98360f19e317c6707"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/man/man1/curl.1.gz",
    "digest": {
      "algorithm": "md5",
      "value": "1326b53b4e64bf16ed6558a94496a0e8"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/zsh/vendor-completions/_curl",
    "digest": {
      "algorithm": "md5",
      "value": "1fe4ab18bfb8fe595c42534a37ab27a3"
    },
    "isConfigFile": false
  }
]

However, when scanning with file source, we see no file metadata associated with the DPKG installation

syft -o syft-json file:/var/lib/dpkg/status | jq '.artifacts[] | select(.name == "curl")'

{
  "id": "768c7f6773e9852e",
  "name": "curl",
  "version": "7.81.0-1ubuntu1.18",
  "type": "deb",
  "foundBy": "dpkg-db-cataloger",
  "locations": [
    {
      "path": "/status",
      "accessPath": "/status",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:curl:curl:7.81.0-1ubuntu1.18:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "",
  "metadataType": "dpkg-db-entry",
  "metadata": {
    "package": "curl",
    "source": "",
    "version": "7.81.0-1ubuntu1.18",
    "sourceVersion": "",
    "architecture": "amd64",
    "maintainer": "Ubuntu Developers <[email protected]>",
    "installedSize": 444,
    "depends": [
      "libc6 (>= 2.34)",
      "libcurl4 (= 7.81.0-1ubuntu1.18)",
      "zlib1g (>= 1:1.1.4)"
    ],
    "files": []
  }
}

This makes sense since using a file source will cause the file resolver to only index the target file and its containing directory. So when the DPKG cataloger tries to resolve the 'Infos' directory after parsing the DPKG DB, the index will contain no entries & it will fail to resolve the file ownership metadata.

However, as a user, I do not know that I have missing metadata here unless I go and read the cataloger implementation and understand that it requires more than the scanned file to correctly populate its results.

I would like to start a discussion here regarding how feasible it would be to make catalogers 'aware' of the fact that they require > 1 file to successfully perform all of their work.

In the case of DPKG for example, if it knows that we're scanning using a file source, it could then perform a 'second pass' and attempt to index the Infos or status.d directories used to determine file ownership so that the resolver passed to findDpkgInfoFiles can find owned files despite using a file source.

Why is this needed:
When I scan with file source, I'd like the catalogers to provide me with complete results even when a suitable cataloger requires more than one file to perform its work.

Additional context:

@adammcclenaghan adammcclenaghan added the enhancement New feature or request label Oct 17, 2024
@anchore anchore deleted a comment from BlowMeMike Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-discussion
Projects
Status: No status
Development

No branches or pull requests

2 participants