Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Includeresource duplicate strategy to append to existing file when unrolling jar #6326

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

chrisrueger
Copy link
Contributor

@chrisrueger chrisrueger commented Oct 13, 2024

Closes #6325

Adds a duplicate strategy to -includeresource (similar to :flatten or :rename)

dup_merge:=*

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;dup_merge:=*

@pkriens This is a first draft based on our discussion last week.

NOTHING,
OVERWRITE,
APPEND

Signed-off-by: Christoph Rueger <[email protected]>

use SequenceInputStream to concatenate...

resources to avoid intermediate Strings

Signed-off-by: Christoph Rueger <[email protected]>

handle duplicates for :literal

useful for testcases
Signed-off-by: Christoph Rueger <[email protected]>
Signed-off-by: Christoph Rueger <[email protected]>
@chrisrueger chrisrueger force-pushed the includeresource-duplicate-strategy branch from 20a2822 to 7c6bb03 Compare October 14, 2024 04:25
@chrisrueger
Copy link
Contributor Author

@pkriens one thing I am thinking about and would like to discuss is, wether or not we shoud add a way to enforce a line-break between the appended files.
In my specific example

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;:duplicates:=APPEND

a line break is not required, since the first file ends with one.
But in case it is not, a line break would be required so that the content is correctly appended and does not result in an invalid file.

One option would be another strategy in addition to APPEND.

  • APPEND_ENFORCE_LINEBREAK (maybe shorter, but just to get my point across). This would add a linebreak before the duplicate file is appended (maybe if there is none already??)
  • APPEND (current append-logic... just append... nothing in between)

@pkriens
Copy link
Member

pkriens commented Oct 14, 2024

You are becoming the Elon Musk of bnd :-) Nice productivity.

Some comments.

  • This not a function of JAR. It is the include resource code in Builder that has the policies. It can access the properties and the plugins. Jar is just a holder, it should do what it is told. It should not be changed for this.
  • I think the actual merging should go to a plugin. This plugin should get a name and two Resource objects and return a single one or null if it can't merge. We can then make a plugin that can merge files in the services directory and just append with an extra line between because it knows the format. This can be a basic plugin that is always present in a Builder.
    package aQute.bnd.service.merge;

    public interface MergeFiles {
         Optional<Resource> merge( String path, Resource a, Resource b);
   }
  • I think you should be able to specify the duplicate strategy with a dup_overwrite:, dup_merge: (default), dup_error:, dup_warning:, dup_skip:. The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.
-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 14, 2024

Your approach sounds good and more flexible.

  • This not a function of JAR. It is the include resource code in Builder that has the policies. It can access the properties and the plugins. Jar is just a holder, it should do what it is told. It should not be changed for this.

I recognize a pattern here. I am always one level too deep 🤣

  • Jar is just a holder, it should do what it is told. It should not be changed for this.

What about the existing method with the boolean overwrite parameter and duplicate detection?
public boolean putResource(String path, Resource resource, boolean overwrite)

This was the main reason why I have put it here. But since it is public, we have to keep it anyway.

  • I think the actual merging should go to a plugin.

Are you talking plugin as in
public class MetaInfServiceParser implements AnalyzerPlugin

?

Ok I will try to digest your ideas and see what I can do.

@pkriens
Copy link
Member

pkriens commented Oct 14, 2024

I recognize a pattern here. I am always one level too deep 🤣

No, sometimes multiple levels 😎

When I was younger you had "structured design", DeMarco, Michael Jackson, Yourdan, etc. They were talking about coupling and cohesion. Did not get the cohesion until a few years ago but I think it is paramount. I try to write software based on reusable components. So each component must do one thing and one thing only otherwise reuse becomes hard. So I now strife to create classes and methods that do one thing. Very useful in there is not not do control (if/then/switch/case) and actual work in the same method.

Anyway, in bnd it is relatively straightforward. Whenever there is choice, it should be in a processor. Things like JAR, Resource, Parameters, etc. are reusable and should just do their work and not take decisions.

@chrisrueger
Copy link
Contributor Author

chrisrueger commented Oct 15, 2024

public interface MergeFiles {
Optional merge( String path, Resource a, Resource b);
}

Few questions @pkriens :

  1. What intention has the String path variable? Should the plugin check the path if it can merge with it? (e.g. path.startsWith('META-INF/services').

  2. How many plugins do you envision? Should there be multiple mergePlugins (similar to Analyzer Plugins, which are put in a list and processed all together (as in doPlugins(...)) e.g. one MetainfServicesMergeFiles.java and one MetainfServicesMergeFiles.java ?
    Or more like one "FileMerger" which checks the path to apply different merging mechanisms based on the path?

I ask because I have things setup so far (not pushed), but not sure how simple or complex the "plugin"-mechanism should be.

to handle -includeresource expressions like this, as suggested by @pkriens :

-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.

Signed-off-by: Christoph Rueger <[email protected]>

rework to MergeFiles plugin
@chrisrueger chrisrueger force-pushed the includeresource-duplicate-strategy branch from 226d669 to 5df7631 Compare October 15, 2024 19:19
should avoid re-creating multiple Globs (Pattern.compile()) for each file. Should speed up things e.g. for unrolling jars with lots of files

Signed-off-by: Christoph Rueger <[email protected]>
@chrisrueger
Copy link
Contributor Author

  • I think you should be able to specify the duplicate strategy with a dup_overwrite:, dup_merge: (default), dup_error:, dup_warning:, dup_skip:. The value of these directives is a list of globs on the paths in the resource. Priority is probably merge (if plugin exists), overwrite, skip. Error/Warning should always be executed even if it matches the other ones.
-includeresource @jar1.jar, @jar2.jar;dup_overwrite:=*, @jar3.jar;dup_skip:="META-INF/services/*,META-INF/MANIFEST.MF"

@pkriens I pushed the first prototype of this approach.

Example: dup_merge AND dup_warning combined:

@${repo;org.apache.xmlgraphics:fop-core;latest}!/META-INF/services/*,\
@${repo;org.apache.xmlgraphics:xmlgraphics-commons;latest}!/META-INF/services/*;dup_warning:=*;dup_merge:=*,\

Warnings are displayed (TODO for me: the message is not correct. the word "overwritten" should not appear when same file is also affected by dup_merge):

image

dup_merge in effect: line break added in META-INF/services files

image

but other files outside META-INF/services will be just appended (without line break).

This part is what I would like to discuss: I have created two MergeFiles plugins and iterate over them via

return proc.getPlugins(MergeFiles.class)
					.stream()
					.map(mf -> mf.merge(path, existing, resource))
					.filter(Optional::isPresent)
					.findFirst()
					.orElse(Optional.of(resource));

But that does not feel right, because the order of the plugins is important.
Maybe I misunderstood you but in the beginning two fixed plugins might be better and more explicit, e.g.:

return Stream.of(metaInfServiceMerger, defaultResourceMerger)
					.map(mf -> mf.merge(path, existing, resource))
					.filter(Optional::isPresent)
					.findFirst()
					.orElse(Optional.of(resource));
  • metaInfServiceMerger can merge with additional line break IF path.startsWith(METAINF/services)
  • defaultResourceMerger just appends without anything in between and without any checking of the path.
  • if metaInfServiceMerger can merge, then this result is used. Otherwise defaultResourceMerger is used

- instead of iterating of plugins we just check directly which of the two plugins to use, whether we have a META-INF/services file or not (we can get more dynamic in the future if we want, since all is still private)
- and add testcases

add test for mixed files in META-INF

to test that META-INF/services files are merged differently than files in META-INF although they are captured by the same dub_merge glob

Signed-off-by: Christoph Rueger <[email protected]>
@chrisrueger chrisrueger marked this pull request as ready for review October 23, 2024 13:30
Signed-off-by: Christoph Rueger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants