-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Stream sbom to disk (avoiding large memory footprint and OOMs) #3263
Comments
👋 thanks for the issue @HairyMike --> We're trying to understand where the stress points are that bring up this suggestion. Are you dealing with SBOMs with large amounts of packages? Do you have an image we could use to start tinkering with what something like this could look like? There are a lot of complexities we could assume when trying to organize this solution and some of them might not actually solve the problem that was encountered. |
I don't see a way to stream the format agnostically as we find packages/relationships. But I image we could spool results to a sqlite file on disk and have an SBOM object backed by this sqlite DB to drive how we format the final SBOM. This, however, could run into the same OOM issue depending on the nature of the image. |
Thanks for the replies @spiffcs and @wagoodman. This is related to scans of large disks that contain a lot of files. For example - scanning disks attached to a CI machine like Jenkins (where scans can take many hours) can lead to OOMs if the machine doing the scanning doesn't have enough memory. A few figures from a scan we attempted: on a 250Gb CI node we found 1.5M packages and consumed ~14Gb memory. Scan time was ~15 hours on a m7i-flex.xlarge One way around it is to break up the scans into smaller chunks and produce multiple sboms, but I think streaming directly onto disk would avoid the need for manual chunking / or high scanner memory. I don't currently have an image that can be used to reproduce this, although may be able to provide one later. My hope is that theres a way to incrementally create the sbom during that scan, that would mean the size of the scan target would be limited by disk size (cheap) as opposed to memory (expensive). |
That's a lot of packages! Would you be willing to post a pprof profile for us to take a look at? This can be produced with:
We're aware of a few memory adjustments to make based on anchore/stereoscope#233 , but I'm interested in your specific profile to see if we have other findings here. |
The team chatted about this one today and came to a few conclusions:
The changes we think we'd make to the system are:
|
What would you like to be added:
Currently, Syft builds the sbom report in memory before writing it to disk. I propose that instead of building in memory, we stream directly to disk.
Why is this needed:
To avoid OOMs
Additional context:
SBOM generation:
https://github.com/anchore/syft/blob/main/cmd/syft/internal/commands/scan.go#L199
https://github.com/anchore/syft/blob/main/internal/task/package_task_factory.go#L116
https://github.com/anchore/syft/blob/main/syft/create_sbom.go#L66
Report generation:
https://github.com/anchore/syft/blob/main/cmd/syft/internal/commands/scan.go#L208
The text was updated successfully, but these errors were encountered: