Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fetch Maven metadata from specified repositories #1286

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

cuixq
Copy link
Contributor

@cuixq cuixq commented Sep 30, 2024

#1045

There are repositories defined in a Maven pom.xml. When looking for an artifact, these repositories are searched one by one until the artifact is found. Maven Central is the default registry to try at the last.

To support this behaviour, this PR:

  • makes MavenRegistryAPIClient host a list of registries besides the default registry
  • adds UpdateRegistries to DependencyClient to update the registries
  • adds a new flag to specify the default maven registry for fix
  • add new experimental options to scan to align with what we have for fix

TODO:

  • still need to update documentation for new options/flags
  • update deps.dev Maven resolver for mutil-registry resolution
  • record not found requests to optimize performance

@codecov-commenter
Copy link

codecov-commenter commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 66.48936% with 63 lines in your changes missing coverage. Please review.

Project coverage is 68.88%. Comparing base (1d94b4a) to head (9be580e).

Files with missing lines Patch % Lines
internal/manifest/maven.go 40.00% 11 Missing and 4 partials ⚠️
internal/resolution/datasource/maven_registry.go 85.07% 7 Missing and 3 partials ⚠️
internal/utility/maven/maven.go 47.05% 6 Missing and 3 partials ⚠️
...nternal/resolution/client/maven_registry_client.go 53.33% 5 Missing and 2 partials ⚠️
cmd/osv-scanner/fix/noninteractive.go 57.14% 4 Missing and 2 partials ⚠️
internal/resolution/manifest/maven.go 45.45% 4 Missing and 2 partials ⚠️
cmd/osv-scanner/scan/main.go 66.66% 2 Missing and 1 partial ⚠️
pkg/osvscanner/osvscanner.go 84.21% 2 Missing and 1 partial ⚠️
cmd/osv-scanner/fix/main.go 50.00% 1 Missing ⚠️
internal/resolution/client/npm_registry_client.go 0.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1286      +/-   ##
==========================================
+ Coverage   68.22%   68.88%   +0.65%     
==========================================
  Files         178      178              
  Lines       17235    17346     +111     
==========================================
+ Hits        11759    11948     +189     
+ Misses       4835     4728     -107     
- Partials      641      670      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cuixq cuixq marked this pull request as ready for review September 30, 2024 03:35
@cuixq cuixq requested review from michaelkedar, another-rex and oliverchang and removed request for michaelkedar September 30, 2024 03:35
Copy link
Member

@michaelkedar michaelkedar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments

Comment on lines 33 to 34
// UpdateRegistries updates the registries to fetch data.
UpdateRegistries(registries []string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about using []string here: npm for example would want a map for {"@scope": "url"} (if we were to implement it), and both would eventually need authentication information per url.

Could this take in a Manifest? (but we'd also need to work with lockfiles...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to make this function take a slice of struct instead of a slice of string - there is also Maven specific information I want to add to this struct e.g. if it is allowed to download snapshots.

internal/resolution/datasource/maven_registry.go Outdated Show resolved Hide resolved
internal/resolution/client/maven_registry_client.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@oliverchang oliverchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good with some nits/questions.

cmd/osv-scanner/fix/noninteractive.go Show resolved Hide resolved
cmd/osv-scanner/fixtures/maven-transitive/parent.xml Outdated Show resolved Hide resolved
internal/manifest/maven.go Outdated Show resolved Hide resolved
@@ -151,6 +151,16 @@ func (c *MavenRegistryClient) MatchingVersions(ctx context.Context, vk resolve.V
return resolve.MatchRequirement(vk, versions), nil
}

func (c *MavenRegistryClient) UpdateRegistries(registries []Registry) error {
for _, reg := range registries {
if err := c.api.AddRegistry(reg.URL); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to clear any existing registries? Update implies we will overwrite all existing ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to clear the existing registries - maybe this should be renamed to AddRegistries and have another SetRegistries to clear and set the registries (though not sure if this has any use case)

internal/resolution/datasource/maven_registry.go Outdated Show resolved Hide resolved
@cuixq cuixq marked this pull request as draft October 11, 2024 05:46
@cuixq
Copy link
Contributor Author

cuixq commented Oct 11, 2024

I mark this PR to draft - the registries should not be added when importing dependencies, but that is now done by MergeParents for all cases.

@cuixq cuixq marked this pull request as ready for review October 14, 2024 23:22
Copy link
Member

@michaelkedar michaelkedar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a couple of small things

}
for i, e := range g.Edges {
e.Type = dep.Type{}
g.Edges[i] = e
}

if err := overrideClient.WriteCache(f.Path()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't write the cache file at the moment.
It gets saved as pom.xml.resolve.deps / pom.xml.resolve.maven next to the pom.xml file, and we probably don't want users accidentally committing this (large) file to their repos - especially since the file is undocumented and transitive scanning isn't marked as experimental.

Comment on lines +52 to +53
// CopyWithoutRegistries copies MavenRegistryAPIClient including its cache but not registries.
func (m *MavenRegistryAPIClient) CopyWithoutRegistries() *MavenRegistryAPIClient {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe call this just WithoutRegistries - it could be a bit confusing since the cache ends up shared between 'copies'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants