Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to instruct equinox to clean the resolver state #448

Closed
laeubi opened this issue Dec 18, 2023 · 19 comments
Closed

Add an option to instruct equinox to clean the resolver state #448

laeubi opened this issue Dec 18, 2023 · 19 comments
Labels

Comments

@laeubi
Copy link
Member

laeubi commented Dec 18, 2023

Currently if one performs an install of new software it can happen that:

  1. it takes very very long to start but after that it works
  2. it takes very very long and one gets very obscure resolve errors
  3. sometimes even eclipse crash completely (e.g. a bundle with native code is updated)

many of the cases (except there are really incompatible installs) can be solved by starting the framework with -clean option but this does not only clean the resolver state but nay persisted state and is a manual action to perform. Instead P2 should be able to instruct the framework to remove clear the current resolved state and start from fresh.

The least intrusive way seem to have a marker file that is stored at configuration/org.eclipse.osgi e.g. .refreshstate, whenever equinox find that file it should delete it and throw away its previously cached state.

@tjwatson do you think this is suitable, what would be the best place to start with such implementation?

@HannesWell
Copy link
Member

This reminds me of eclipse-pde/eclipse.pde#775 (comment).

@tjwatson
Copy link
Contributor

This reminds me of eclipse-pde/eclipse.pde#775 (comment).

Right, with commit 907ec00 I see no need for this mechanism. P2 can already force a re-resolve of all bundles and should with that commit. Any issues not fixed by that approach will not be fixed by this new flag to tell the framework itself to re-resolve all installed bundles on start.

@laeubi
Copy link
Member Author

laeubi commented Dec 18, 2023

Any issues not fixed by that approach will not be fixed by this new flag to tell the framework itself to re-resolve all installed bundles on start.

I'm not sure this assertion is true. e.g. this only happens when simple configurator has started and then refresh everything but the "bad" things happen before. There is still a big difference between an empty configuration/org.eclipse.osgi directory (-clean) and starting with an initialized one, so I really want this flag to behave as if configuration/org.eclipse.osgi is an empty directory right from the start, so it should not re-resolve but start from fresh as if there is no resolved state at all but for example retain bundle data areas and bundle ids.

@tjwatson
Copy link
Contributor

so I really want this flag to behave as if configuration/org.eclipse.osgi is an empty directory right from the start, so it should not re-resolve but start from fresh as if there is no resolved state at all but for example retain bundle data areas and bundle ids

If we retain the bundle data areas and the bundle ids then that means we are not really starting with an empty org.eclipse.osgi directory. The framework will be launched and it will have the existing set of bundles already installed with their IDs and data areas already assigned. All we can do at that point is throw away the resolution wirings and re-resolve everything, which is what p2 should be trying to do.

@laeubi
Copy link
Member Author

laeubi commented Dec 19, 2023

If we retain the bundle data areas and the bundle ids then that means we are not really starting with an empty org.eclipse.osgi directory.

That's for sure, the resolver/state recovery should just be like if it where empty. So at some point I assume there will be something like

if (hasCachedResolverState()) {
   // use it
} else { 
  //resolve fresh
}

that should become

if (hasCachedResolverState() && !markerFile.delete()) {
   // use it
} else { 
  //resolve fresh
}

which is what p2 should be trying to do.

Maybe that's the intention and maybe it works in many cases but I'm observing different behavior.

I'm currently analyzing a larger wiring problem (using Eclipse 2023-03 where the fix should be included) and what I see is the following:

  1. I start a fresh install of an eclipse, then install the problematic features what result ins about 15 new bundles installed
  2. After restart (what takes very long) most of them are in INSTALLED state and the log reports wiring problems of the form of multiple providers for the same package
  3. Now I close eclipse and start it with -clean option, it still takes long, but now the new bundles are all resolved but only three of the "old" ones now not resolving and I see a package-import that can't be resolved

So basically I end up with a completely different state with completely different errors depending of if there is a cached state or not, the same can be reproduced when I use the director on the commandline, installing into a fresh extracted eclipse results in state (3), if I ever have started eclipse before it it state (2).

@tjwatson
Copy link
Contributor

So basically I end up with a completely different state with completely different errors depending of if there is a cached state or not, the same can be reproduced when I use the director on the commandline, installing into a fresh extracted eclipse results in state (3), if I ever have started eclipse before it it state (2).

I don't believe only re-resolving all bundles by discarding the wiring state will solve that problem because that is what p2 is doing already. Doing a clean start causes all bundles to be re-installed by p2 and they end up in a different order from your state (2). My observation is that the bundles are installed in alphabetical BNS order. Unfortunately the OSGi specification puts some priority on install order with respect to what providers are preferred. I suspect that order difference is what causes it to "work" for you on a clean start.

With that in mind, I don't think discarding the wiring state is going to make this work because we will still have the same problematic install order as before.

@laeubi
Copy link
Member Author

laeubi commented Dec 19, 2023

If order matters its probably not a good idea that simple configurator order them alphabetically (I assume order of bundles.info is applicable here).

I just don't have a clue then how one should be able to analyze this, strange enough clean gives often a better result that incremental install. Maybe then installing new bundles should not result in refresh the state but also reassign bundle ids and simply move their bundle data to the new id if it has changed.

@tjwatson
Copy link
Contributor

Maybe then installing new bundles should not result in refresh the state but also reassign bundle ids and simply move their bundle data to the new id if it has changed

That will break OSGi compliance since bundle IDs are supposed to be constant.

@laeubi
Copy link
Member Author

laeubi commented Dec 19, 2023

That will break OSGi compliance since bundle IDs are supposed to be constant.

Also between restarts? Obviously a clean do not keep the ids constant...

@tjwatson
Copy link
Contributor

Also between restarts? Obviously a clean do not keep the ids constant...

Yes, between restarts IDs are persistent. Clean follows the specified behavior for https://docs.osgi.org/javadoc/osgi.core/8.0.0/org/osgi/framework/Constants.html#FRAMEWORK_STORAGE_CLEAN_ONFIRSTINIT

Where first init means the first call to Framework.init for the JVM instance for a particular instance of the framework.

@laeubi
Copy link
Member Author

laeubi commented Dec 22, 2023

@tjwatson thanks for the explanation and the hint about bundle IDs and I think there is really causing the issue, I sadly have not a "simple small reproducer" I can share but will try to explain the problem on a high level in the hope it might still be useful.

  • I have a quite large bundles set (~700 bundles)
  • I have some application bundles with a quite low ID (< 100) that import a package with a narrow version range in this case (&(osgi.wiring.bundle=com.google.gson)(&(bundle-version>=2.9.1)(!(bundle-version>=2.10.0))))
  • Now I have another bundle with a quite high ID (> 600) (org.eclipse.sprotty.server) importing the package with a wider version range (&(osgi.wiring.package=com.google.gson)(&(version>=2.8.0)(!(version>=3.0.0))))
  • Furthermore there are two versions of gson 2.10.1 and 2.9.1
  • Now it happens that the application bundles that require org.eclipse.sprotty.server fail to resolve with

Bundle was not resolved because of a uses constraint violation ... because it is exposed to package 'com.google.gson.internal' from resources com.google.gson [osgi.identity; osgi.identity="com.google.gson"; type="osgi.bundle"; version:Version="2.9.1.v20220915-1632"] and com.google.gson [osgi.identity; osgi.identity="com.google.gson"; type="osgi.bundle"; version:Version="2.10.1.v20230109-0753"] via two dependency chains.

  • It shows that org.eclipse.sprotty.server has no wires it provides to any other bundle!
  • If I now refresh org.eclipse.sprotty.server everything gets finally resolved!

So what seems to happen here is that batchsize is actually to small but it already takes very long so I'm not convinced its a good idea to raise it. Instead I'm wondering, if the resolver fails with use-constraint violation, if it would be feasible if the resolver starts one last attempt by refreshing the wiring for the conflicting providers?

@laeubi
Copy link
Member Author

laeubi commented Jan 2, 2024

I made further experiments now to further analyze the problem and it really seems to be the problem that the resolver seem not to "partitioning" be bundles correctly.

I have now written a "fix wires" command that do the following:

  1. It performs a resolve and collects the ResolutionReport
  2. Then it collects a set of Requirements by examine Type.MISSING_CAPABILITY and Type.USES_CONSTRAINT_VIOLATION
  3. The it looks for any providers of such a requirements and builds a new set of bundles
  4. Then this reduced set of bundles is refreshed

This finally leads to a resolved state!

@tjwatson I probably can share the application with you that reproduces the problem if you are interested in an analysis, just let me know.

@laeubi
Copy link
Member Author

laeubi commented Jan 3, 2024

I now created a PR here to mitigate the issue:

this solves the problem in my tests when I see unresolved/conflicting requirements that are actually resolvable.

@tjwatson
Copy link
Contributor

tjwatson commented Jan 3, 2024

I now created a PR here to mitigate the issue:

this solves the problem in my tests when I see unresolved/conflicting requirements that are actually resolvable.

This is great analysis. I think this is a workable "workaround". I do wish we could have a testcase added to exercise the approach though. Is that at all feasible to add? If we had that it may help others analyze why the resolver isn't partitioning the bundles correctly. Perhaps the batch size approach isn't working well in this scenario?

@laeubi
Copy link
Member Author

laeubi commented Jan 3, 2024

@tjwatson if it is fine for you I can contact you directly and elaborate how to to share the test-case it is quite large and contains some custom bunldes so I'm not 100% sure how to best make it a standalone test-case.

@laeubi
Copy link
Member Author

laeubi commented Jan 8, 2024

I now started to try creating a local test-case in org.eclipse.osgi.tests.container.TestModuleContainer by feeding the manifests of my test installation and then performing the resolve, but this almost instantly fails and the reports seem to indicate something is wrong with my provided system bundle e.g. I see

Unresolved requirement: Import-Package: org.eclipse.osgi.util
    -> Export-Package: org.eclipse.osgi.util; bundle-symbolic-name="org.eclipse.osgi"; bundle-version="3.18.500.v20230801-1826"; version="1.1.0"
  Unresolved requirement: Import-Package: org.eclipse.swt.layout
    -> Export-Package: org.eclipse.swt.layout; bundle-symbolic-name="org.eclipse.swt"; bundle-version="3.124.100.v20230825-1346"; version="0.0.0"
       org.eclipse.swt [609]
         No resolution report for the bundle.  Unresolved requirement: Import-Package: org.eclipse.ui
    -> Export-Package: org.eclipse.ui; bundle-symbolic-name="org.eclipse.ui.ide"; bundle-version="3.21.100.v20230825-1346"; version="0.0.0"
       org.eclipse.ui.ide [681]
         Unresolved requirement: Import-Package: javax.annotation; version="[1.3.0,2.0.0)"
           -> Export-Package: javax.annotation; bundle-symbolic-name="javax.annotation"; bundle-version="1.3.5.v20221203-1659"; version="1.3.5"
              javax.annotation [126]
                Unresolved requirement: Require-Bundle: system.bundle

@laeubi
Copy link
Member Author

laeubi commented Jan 8, 2024

@tjwatson I was able to solve this by removing all mentions of Require-Bundle: system.bundle (even though it seems a valid requirement), but now i have the problem that the resolver runs endlessly so it seems the test container is missing some setup a usual Equinox has...

EDIT: Found the necessary steps in testUsesTimeout ...

@tjwatson
Copy link
Contributor

tjwatson commented Jan 8, 2024

You could look at org.eclipse.osgi.tests.container.TestModuleContainer.setupModuleDatabase(). This isn't used by any enabled tests at the moment but you can comment back in the test org.eclipse.osgi.tests.container.TestModuleContainer.testLoadPerformance() to see it work. This basically takes all bundles installed from the running environment and uses their manifests to create a module database. There is some extra work to make the system bundle get setup correctly there.

Copy link

github-actions bot commented Jul 7, 2024

This issue has been inactive for 180 days and is therefore labeled as stale.
If this issue became irrelevant in the meantime please close it as completed. If it is still relevant and you think it should be fixed some possibilities are listed below.
Please read https://github.com/eclipse-equinox/.github/blob/main/CONTRIBUTING.md#contributing-to-eclipse-equinox for ways to influence development.

@github-actions github-actions bot added the stale label Jul 7, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants