-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CBOM support #933
base: main
Are you sure you want to change the base?
feat: CBOM support #933
Conversation
Hi @san-zrl Thank you for the Pull Request! I believe this will be a great feature. It seems the test pipeline action is currently https://github.com/DependencyTrack/hyades-apiserver/actions/runs/11067343603/job/30750563096?pr=933
|
Signed-off-by: san-zrl <[email protected]> fix: added CryptoAssetsResource Signed-off-by: san-zrl <[email protected]> added getAllCryptoAssets() perr project and globally Signed-off-by: san-zrl <[email protected]>
4a7bd10
to
febbbe6
Compare
Signed-off-by: san-zrl <[email protected]>
Hi @VinodAnandan - I looked into the tests and fixed the main problems that are related to the migration to cyclonedx 1..6. Not sure if the entire test set works because I never managed to run it successfully on my local system. I'm on a Mac M1 and getting testcontainers to run was a challenge. Issues:
|
Can you elaborate what about the test containers was problematic? The team so far has been working predominantly with M1 macs, so that should not be a problem.
Can you share the errors you were getting here? Really all the refresh is doing is reloading the object from the database, so it's not much different from doing a
Same as above, can you share the errors you're getting? |
Also, have you tried if launching the API server with Dev Services works for you? https://dependencytrack.github.io/hyades/0.6.0-SNAPSHOT/development/testing/#api-server |
I ran above tests with |
I can reproduce the failures of the component property tests in this PR, but not in The problem seems to be the modified if (cid.getOid() != null) {
filterParts.add("(cryptoAssetProperties != null && cryptoAssetProperties.oid == :oid)");
params.put("oid", cid.getOid());
} else {
filterParts.add("cryptoAssetProperties != null && cryptoAssetProperties.oid == null");
} But it should be this instead: if (cid.getOid() != null) {
filterParts.add("(cryptoAssetProperties != null && cryptoAssetProperties.oid == :oid)");
params.put("oid", cid.getOid());
} else {
filterParts.add("(cryptoAssetProperties == null || cryptoAssetProperties.oid == null)");
} The |
@nscuro: Good catch, thanks! I'm using Rancher. |
@nscuro - The |
Will have a look at the failing tests. Regarding Testcontainers, could this be relevant? https://docs.rancherdesktop.io/how-to-guides/using-testcontainers/#prerequisites |
Signed-off-by: san-zrl <[email protected]>
I've seen this. Rancher uses admin rights. Kubernetes is disabled, VM type is set to QEMU. My env settings are
|
Signed-off-by: san-zrl <[email protected]>
Not entirely sure, but maybe Ryuk not being there could be a problem. Can you try enabling it? |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
TESTCONTAINERS_RYUK_DISABLED=false makes no difference for |
Signed-off-by: san-zrl <[email protected]>
Signed-off-by: san-zrl <[email protected]>
Signed-off-by: san-zrl <[email protected]>
Hi @nscuro, I pushed some code to bump the test coverage of the PR above the required threshold. The corresponding pipeline actions have been hanging since yesterday morning (https://github.com/DependencyTrack/hyades-apiserver/actions/runs/11177278805) with a message "This workflow is awaiting approval from a maintainer in #933". Could you check what's going on? |
@san-zrl PRs from first-time contributors need explicit approval for workflows to run. Approved it now. |
@VinodAnandan, thanks for letting me know. Diff test coverage is not quite on target Do you want me to further increase this number? |
@san-zrl Don't sweat about the coverage too much for now. I haven't yet had the chance to thoroughly review the PR, and I am taking a few days "off" at the moment so there's limited time I'm spending on GitHub. Assigning this to me, I'll get back to you ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial feedback with mostly questions from my side. Apologies for letting this sit for so long.
<column name="BOM_REF" type="VARCHAR(64)"/> | ||
<column name="LOCATION" type="VARCHAR(255)"/> | ||
<column name="LINE" type="INT"/> | ||
<column name="OFFSET" type="INT"/> | ||
<column name="SYMBOL" type="INT"/> | ||
<column name="ADDITIONAL_CONTEXT" type="VARCHAR(255)"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The character limits on BOM_REF
, LOCATION
and ADDITIONAL_CONTEXT
seem a bit optimistic to me. If LOCATION
ends up being a file path, I can see the limit of 255
being broken rather trivially. Similar story for ADDITIONAL_CONTEXT
, which seems to be a field of arbitrary content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nscuro, thanks a lot for your review.
The code above comes from the definition of the OCCURRENCES table. The PR currently adds all crypto-related data fields that our scanner (https://github.com/IBM/cbomkit) generates. We can drop OCCURRENCES because the location of crypto assets in source is probably not important in DT.
Nevertheless, the your point re. the size of the varchar columns is still valid. There is no proper reasoning behind the current numbers. We started with some guesstimates based on standard (https://cyclonedx.org/docs/1.6/json/) and examples (https://github.com/CycloneDX/bom-examples/tree/master/CBOM) and simply doubled if the initial numbers weren't sufficient. I'd gladly use other numbers with better foundation. If data size is an issue we should go through all varchars one-by-one and decide (a) does it have to be persisted and (b) if so, what size is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop OCCURRENCES because the location of crypto assets in source is probably not important in DT.
Occurrence will likely come to DT in the future to cater to other use cases, but if it's not crucial for the crypto functionality it's good to drop here IMO.
I'd gladly use other numbers with better foundation. If data size is an issue we should go through all varchars one-by-one and decide (a) does it have to be persisted and (b) if so, what size is needed.
In PostgreSQL it's all TEXT
(unlimited length) behind the scenes, with an implicit CHECK
constraint for the length. So other than preventing users from storing GBs of useless data, size constraints don't matter. With this in mind, I think choosing an initial length of 1024
should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Occurrence will likely come to DT in the future to cater to other use cases, but if it's not crucial for the crypto functionality it's good to drop here IMO
....
With this in mind, I think choosing an initial length of 1024 should be fine.
Okay, thanks.
- I will remove Occurrence with the next commit.
- VARCHAR length will default to 1024 unless we know better.
<addColumn tableName="COMPONENT"> | ||
<column name="CRYPTO_PROPERTIES_ID" type="BIGINT"/> | ||
</addColumn> | ||
|
||
<createTable tableName="CRYPTO_PROPERTIES"> | ||
<column autoIncrement="true" name="ID" type="BIGINT"> | ||
<constraints nullable="false" primaryKey="true" primaryKeyName="CRYPTO_PROPERTIES_PK"/> | ||
</column> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to have a COMPONENT_ID
column on the CRYPTO_PROPERTIES
table?
Given the FK constraint with onDelete="CASCADE"
, the current setup would cause the COMPONENT
record to automatically be deleted when CRYPTO_PROPERTIES
gets deleted. It should be the other way around, i.e. CRYPTO_PROPERTIES
being deleted automatically when COMPONENT
gets deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. This would be a better idea but I didn't manage to make this work.
I tried to specify a COMPONENT_ID in CRYPTO_PROPERTIES that refers upwards to the enclosing component along with an FK constraint that would cause the right order of cascading deletes. This led to a problem when trying to persist the modified crypto asset properties object. I reproduced the error msg with stacktrace and appended it below for reference. It turns out that CryptoAssetProperties@5af49362 is not the instance from ModelConverter (in which the component id was properly set) but another instance created somewhere deep down in the transaction code.
2024-10-21 15:22:19,384 ERROR [Persist] Insert of object "org.dependencytrack.model.CryptoAssetProperties@5af49362" using statement "INSERT INTO "CRYPTO_PROPERTIES" ("ALGORITHM_PROPERTIES_ID","ASSET_TYPE","CERTIFICATE_PROPERTIES_ID","COMPONENT_ID","OID","PROTOCOL_PROPERTIES_ID","RELATED_MATERIAL_PROPERTIES_ID") VALUES (?,?,?,?,?,?,?)" failed : ERROR: null value in column "COMPONENT_ID" of relation "CRYPTO_PROPERTIES" violates not-null constraint
Detail: Failing row contains (3, null, ALGORITHM, 3, null, null, null, 2.16.840.1.101.3.4.1.6). [bomSerialNumber=e8c355aa-2142-4084-a8c7-6d42c8610ba2, bomFormat=CycloneDX, bomUploadToken=6cb46db7-e009-4146-8615-122179457b63, projectName=x, bomSpecVersion=1.6, projectUuid=416f54cc-8621-492f-838f-b87ef3de9dad, projectVersion=null, bomVersion=1]
javax.jdo.JDODataStoreException: Insert of object "org.dependencytrack.model.CryptoAssetProperties@5af49362" using statement "INSERT INTO "CRYPTO_PROPERTIES" ("ALGORITHM_PROPERTIES_ID","ASSET_TYPE","CERTIFICATE_PROPERTIES_ID","COMPONENT_ID","OID","PROTOCOL_PROPERTIES_ID","RELATED_MATERIAL_PROPERTIES_ID") VALUES (?,?,?,?,?,?,?)" failed : ERROR: null value in column "COMPONENT_ID" of relation "CRYPTO_PROPERTIES" violates not-null constraint
Detail: Failing row contains (3, null, ALGORITHM, 3, null, null, null, 2.16.840.1.101.3.4.1.6).
at org.datanucleus.api.jdo.JDOAdapter.getJDOExceptionForNucleusException(JDOAdapter.java:608)
at org.datanucleus.api.jdo.JDOPersistenceManager.flush(JDOPersistenceManager.java:2057)
at org.dependencytrack.tasks.BomUploadProcessingTask.processComponents(BomUploadProcessingTask.java:529)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, is this how you modeled the relationship in this case? https://www.datanucleus.org/products/accessplatform_6_0/jdo/mapping.html#one_many_fk_bi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the 'inner' dependent object to be fetched when the 'outer' object is queried and I wanted the cascading delete to work in both ways. The code does that, so I think we can leave it as is.
The fetching works since the inner
objects (e.g., CryptoAssetProperties
) are modelled as persistent members of the outer
(e.g., Component
) object. Inner objects cannot exist without their 'outer' container. Thus, cascading delete must work in both ways:
- Deleting an 'outer' object must delete all 'inner' objects. This is guaranteed by the
dependent = "true"
flag in thePersistent
annotation. Here is an example:hyades-apiserver/src/main/java/org/dependencytrack/model/Component.java
Lines 394 to 397 in 279450c
@Persistent(defaultFetchGroup = "true", dependent = "true") @Index(name = "COMPONENT_CRYPTO_PROPERTIES_ID_IDX") @Column(name = "CRYPTO_PROPERTIES_ID", allowsNull = "true") private CryptoAssetProperties cryptoAssetProperties; - Deleting an 'inner' object renders the outer objects invalid and they must be deleted. This is guaranteed by the foreign key constraints. Here is an example:
hyades-apiserver/src/main/resources/migration/changelog-v5.6.0.xml
Lines 269 to 272 in 279450c
<addForeignKeyConstraint baseTableName="COMPONENT" baseColumnNames="CRYPTO_PROPERTIES_ID" constraintName="COMPONENT_CRYPTO_PROPERTIES_FK" deferrable="true" initiallyDeferred="true" referencedTableName="CRYPTO_PROPERTIES" referencedColumnNames="ID" onDelete="CASCADE" onUpdate="NO ACTION" validate="true"/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a few tests with crypto BOMs here that verify correct ingestion of data?
From what I'm seeing we're currently relying on the ORM to implicitly persist the object graph, but it may very well be that a more manual approach is necessary (see BomUploadProcessingTask#processComponents
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some tests. Your observation is right, we rely on the ORM magic.
@Persistent | ||
@Column(name = "SIGNATURE_ALGORITHM_REF", jdbcType = "VARCHAR", length=64) | ||
private String signatureAlgorithmRef; | ||
@Persistent | ||
@Column(name = "SUBJECT_PUBLIC_KEY_REF", jdbcType = "VARCHAR", length=64) | ||
private String subjectPublicKeyRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand from the schema, these are BOM refs. I'm wondering if they should be resolved prior to persisting them, such that these would be foreign keys instead? Is there a reason to leave them plain like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, they should be resolved. I haven't done this yet because the issue appears in many places where the standard has xxxRef data fields. In most cases, these are bomrefs that could be resolved to components. Most prominently, the bomref resolution question arises in the dependencies. Here DT currently stores a JSON structure which imho cannot not be the final solution. Hence I thought that bomref resolution ist still an open issue and left the refs plain for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here DT currently stores a JSON structure which imho cannot not be the final solution
Very much agreed. An overhaul is planned but needs more fleshing out: DependencyTrack/dependency-track#3452
Would the resolved references here be part of the dependency graph though, or would it be a separate, graph-like structure? Seeing as the CDX standard still lacks "type" properties on dependency graph nodes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle it's the same tree structure. The bomrefs in the ref fields must obviously point to other components in the BOM otherwise they would be dangling. In a sense, the standard is "over-specified": You can have ref fields that point to components and you can have dependency
objects that also state that one component depends on another. It is not clear if the specification of a signatureAlgorithmRef
requires the specification of a dependency object.
I think our final goal should be to model signatureAlgorithmRef
as
private Component subjectPublicKeyRef;
that is set as the result of the bomref resolution. Maybe we can make these fields transient because the get re-created on the fly when the BOM is loaded.
<createTable tableName="CRYPTO_FUNCTIONS"> | ||
<column autoIncrement="true" name="ID" type="BIGINT"> | ||
<constraints nullable="false" primaryKey="true" primaryKeyName="CRYPTO_FUNCTIONS_PK"/> | ||
</column> | ||
<column name="ALGORITHM_PROPERTY_ID" type="BIGINT"/> | ||
<column name="CRYPTO_FUNCTION" type="VARCHAR(32)"/> | ||
</createTable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be an array column in the ALGORITHM_PROPERTY
table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. I avoided array column types since they are not supported by all db systems. But if we stick to postgres using array types would greatly simplify the table structure. There are a couple of similar structures in CBOM 1.6 where we could migrate to arrays.
<createTable tableName="COMPONENT_OCCURRENCES"> | ||
<column name="COMPONENT_ID" type="BIGINT"/> | ||
<column name="OCCURRENCE_ID" type="BIGINT"/> | ||
</createTable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd replace this join table with a join column (i.e. have a COMPONENT_ID
column in the OCCURRENCES
table). Since the component<->occurrence relationship is 1:N, the overhead of a join table is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR currently adds crypto-related data fields that our scanner (https://github.com/IBM/cbomkit) generates. We can drop OCCURRENCES because the location of crypto assets in source is probably not important in DT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: Is all this new data required to make crypto integration useful? One thing I'd like to avoid is persisting a lot of data without having a need for it.
The less data we store, the fewer pains we'll have in the future as the CycloneDX specification evolves and shifts things around. Due to the tight coupling of the model to the CycloneDX spec, that is definitely something to be vary about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, we could drop Occurrences. Given that each crypto component can have many occurrences this would already save a lot. Other than Occurrences, the PR adds the entire cryptoProperties realm from https://cyclonedx.org/docs/1.6/json/ to persistence. There are some things that may be of less importance in the context of DT (e.g., the implementation platform and execution env in algorithm properties). These are only enum values and the saving potential is limited. We could have a telco and go through the data fields one by one. Perhaps there's too much detail in the standard for DT's purposes.
@@ -135,7 +135,7 @@ public List<Component> getAllComponents() { | |||
@SuppressWarnings("unchecked") | |||
public List<Component> getAllComponents(Project project) { | |||
final Query<Component> query = pm.newQuery(Component.class, "project == :project"); | |||
query.getFetchPlan().setMaxFetchDepth(2); | |||
query.getFetchPlan().setMaxFetchDepth(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What data are we trying to capture with this increase in fetch depth? I wonder if FetchGroup
s would be a better tool to avoid over-fetching of data that is not strictly needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frontend PR contains code for rendering crypto properties. They form an additional layer in the persistence model which must be fetched for rendering (fetch depth 2 was not enough). To keep things simple I just raised the fetch depth to make this work. We could define a dedicated crypto component fetch group and use this only when crypto assets are queried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'd propose we deal with this once everything else in place, as it's just an optimization and DataNucleus' fetch behavior can be quite fiddly to tune.
@Persistent | ||
@Column(name = "BOM_REF", jdbcType = "VARCHAR", length = 64) | ||
private String bomRef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to store the BOM ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we could drop this. The PR currently adds all crypto-related data fields that our scanner (https://github.com/IBM/cbomkit) generates. We can drop OCCURRENCES because the location of crypto assets in source is probably not important in DT.
…oin tables with array columns Signed-off-by: san-zrl <[email protected]>
Hi @nscuro - I just pushed commit 850eab9 that addresses some of the issues we discussed:
Issues not addressed yet:
There is also a corresponding commit for hyades-frontend: DependencyTrack/hyades-frontend@62be90f |
Description
Enhances DT to read, persist, serve and export CBOM 1.6 data. See additional_details section for more information on what has been changed in particular. Note that there is a corresponding PR for hyades-frontend that enhances the UI to render CBOM data.
Addressed Issue
Issue #1538
Additional Details
org.depencytrack.model
org.depencytrack.persistence.v1.CryptoAssetsResource
with endpoints that serve the UIClassifier.CRYPTOGRAPHIC_ASSET
with CryptoProperties and Occurrence attributesChecklist