Submodule support #89

ch1bo · 2023-11-03T17:53:59Z

Fixes #88 by using the idea of #80.

🍂 Adds a test fixture which uses this remote github repository as source: https://github.com/cardano-scaling/foliage-test-with-submodule
🍂 Changes PrepareSource action to checkout a working copy into _cache/git/ if needed, fetches and prepares individual package directories in _cache/<package>/<version>/ for GitHubSource

ch1bo · 2023-11-03T17:54:40Z

Ping @andreabedini and @michaelpj as you created the related issues and I can't assign you as reviewers.

ch1bo · 2023-11-03T18:15:08Z

@andreabedini I would need your feedback on how we should structure this test differently / run it differently as it is obviously not a "pure test" which can run on the Hydra CI.

andreabedini

Thank you for taking the lead on this @ch1bo!

I have always considered this tricking to get right so I have many worries and only few suggestions :D

One worry is that it could be fragile, can we recover from a bad clone? what if the directory is there but empty?

Another one is performance, we end up fetching the whole history while we only need a single commit. Perhaps we can copy what GitHub's actions/cache does https://github.com/input-output-hk/foliage/actions/runs/6774254718/job/18410918251#step:2:26.

We need to take into account concurrent accesses to the same repository. Shake has a concept of Resource to limit concurrent access but I am not sure whether it fits our purpose since our "resources" are dynamic (newResource returns Rules Resource not Action Resource, and I am not sure whether it is safe to use liftIO newResourceIO).

The test failure seems to be genuine and not depend on nix (althought the terrible reporting does depend on nix, and I will have to fix it). I could reproduce locally

  git submodules:           Cloning into '_cache/git/cardano-scaling/foliage-test-with-submodule'...
Submodule 'cardanonical' (https://github.com/CardanoSolutions/cardanonical) registered for path 'cardanonical'
Cloning into '/home/andrea/work/foliage/tests/fixtures/git-submodule.QEFqZ7/_cache/git/cardano-scaling/foliage-test-with-submodule/cardanonical'...
Note: switching to 'db5874494ee5bac3fa8fee07d5806fcec27a2f4e'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at db58744 Add cardanonical as submodule
fatal: destination path '_cache/git/cardano-scaling/foliage-test-with-submodule' already exists and is not an empty directory.
foliage: Error when running Shake build system:
  at want, called at app/Foliage/CmdBuild.hs:48:7 in main:Foliage.CmdBuild
* Depends on: buildAction
  at apply1, called at app/Foliage/PrepareSource.hs:39:31 in main:Foliage.PrepareSource
* Depends on: prepareSource foliage-test-with-submodule-1.1.0 PackageVersionSpec {packageVersionTimestamp = Just 2023-11-03 15:53:59 UTC, packageVersionSource = GitHubSource {githubRepo = cardano-scaling/foliage-test-with-submodule, githubRev = 1.1.0, subdir = Nothing}, packageVersionRevisions = [], packageVersionDeprecations = [], packageVersionForce = False}
  at command_, called at app/Foliage/PrepareSource.hs:127:10 in main:Foliage.PrepareSource
* Raised the exception:
Development.Shake.command_, system command failed
Command line: git clone --recursive https://github.com/cardano-scaling/foliage-test-with-submodule.git _cache/git/cardano-scaling/foliage-test-with-submodule
Exit code: 128
Stderr:
fatal: destination path '_cache/git/cardano-scaling/foliage-test-with-submodule' already exists and is not an empty directory.


                            FAIL (2.18s)
    Building repository (2.18s)
      readCreateProcess: foliage build (exit 1): failed

    Use -p '/git submodules/' to rerun this test only.
  accepts --no-signatures:  OK (0.03s)
    Building repository (0.03s)
    Running checks
  accepts --write-metadata: OK (0.04s)
    Building repository (0.03s)
    Running checks

1 out of 4 tests failed (2.29s)
Test suite foliage-test-suite: FAIL
Test suite logged to:
/home/andrea/work/foliage/dist-newstyle/build/x86_64-linux/ghc-9.4.7/foliage-0.6.0.0/t/foliage-test-suite/test/foliage-0.6.0.0-foliage-test-suite.log
0 of 1 test suites (0 of 1 test cases) passed.
Error: cabal: Tests failed for test:foliage-test-suite from foliage-0.6.0.0.

Most likely this is due doesDirectoryExist. Shake makes the code look imperative but nothing gets run right away, instead it creates a build graph and execute only the parts than need to be updated. Long story short, the documentation says to not call doesFileExist and doesDirectoryExist on files and directory created by the build system.

Perhaps it would be ok to hide this machinery behind IO so that shake does not see it. I am not sure to be honest.

app/Foliage/PrepareSource.hs

app/Foliage/Meta.hs

michaelpj · 2023-11-08T11:18:31Z

I think this potentially saves us a lot of time downloading tarballs, also, since we can just clone a repository once instead of downloading a tarball for every package version ever released from that repository.

100% agree about the concurrency issue. We run foliage with lots of concurrency usually, which is very helpful, we need to cope with the git repo being stateful. Here's a suggestion:

Have a rule to produce a clone. It probably has to always rerun to update the clone?
Have a rule to produce source-at-a-commit. I don't know if there's a slick way to do this, but maybe we could just use git worktree?

That way there is only ever one rule that touches the main repository, and it's safe to run it many times. We can run all the "checkout the source for a commit" in parallel, I think? and then after that we're safe since the sources are all independent.

michaelpj · 2023-11-08T17:36:15Z

Another one is performance, we end up fetching the whole history while we only need a single commit.

Just to be clear: I think this is actually a feature. foliage's use of repositories is much closer to "a few repositories, used many times" than "many repositories, used once" so actually downloading the whole repository may well make things much faster, as we can just checkout the various commits we need instead of re-downloading something each time.

I am not sure whether it is safe to use liftIO newResourceIO

FWIW the definition of newResource is exactly that, so I think it's probably safe.

ch1bo · 2023-11-13T07:35:02Z

Here's a suggestion:
* Have a rule to produce a clone. It probably has to always rerun to update the clone?

* Have a rule to produce source-at-a-commit. I don't know if there's a slick way to do this, but maybe we could just use `git worktree`?
That way there is only ever one rule that touches the main repository, and it's safe to run it many times. We can run all the "checkout the source for a commit" in parallel, I think? and then after that we're safe since the sources are all independent.

I had the same idea but only knew about git archive to produce tarballs for specific commits (which do not contain the submodules). I'll have a look at git worktree and see if that makes things better.

ch1bo · 2023-11-13T15:02:24Z

This works for me now.

@michaelpj The git worktree commands work nicely to easily prepare a separate working copy on which we can run git submodule update. In the end I need to use git worktree prune to make sure the the worktree information in the main git repository stays clean.

@andreabedini I am not very familiar with how shake rules work. It seems though that, if called like this, the rule is re-executed even if the directory exists. Hence the call to doesDirectoryExist and git fetch to make sure we have latest information. I switched to using the "untracked" version of this command which should be in-line to what is documented.

This test fails now because the tarball fetching of sources does not include files from the submodule (but the fixture references a file as a data-file)

Working copies are kept in the _cache/git/ directory.

This adds another version of the same package fetched via git. The same working copy from _cache/git/<owner>/<repo> is re-used in both PrepareSource steps (which hopefully do not conflict) and copied onto individual _cache/<package>/<version> directories. The second package version in the git-submodule fixture _sources also uses a tag name as rev to highlight this is possible as well.

This avoids some boiler plate and works just the same.

This re-uses the git repository in the _cache/git/<user>/<repo> directory, but uses a temporary directory to get the worktree for a given rev to prepare the per-package directory in _cache/<package>.

andreabedini · 2023-11-15T04:50:55Z

@ch1bo Thank you for working on this. I am been reworking the way Foliage is structured to make better use of shake (which I feel I only just understood, finally 😂).

I will take care of rebasing your PR on top of my rework.

andreabedini

@andreabedini I am not very familiar with how shake rules work. It seems though that, if called like this, the rule is re-executed even if the directory exists. Hence the call to doesDirectoryExist and git fetch to make sure we have latest information. I switched to using the "untracked" version of this command which should be in-line to what is documented.

The run Action is re-executed every time the rule is invoked, but with the extra information on whether its dependencies have changed. The thing is that doesDirectoryExist triggers a built-in rule; which the GitClone rule will depend upon. This means that GitClone is changing the output of another rule, which is ... not a good thing.

E.g.

First run:

doesDirectoryExist is invoked, directory does not exist, it flags RunDependenciesChanged and returns false.
GitClone is called with RunDependenciesChanged and it creates the directory.

Second run:

doesDirectoryExist is invoked, directory now does exist, it flags RunDependenciesChanged and returns true.
GitClone is called with RunDependenciesChanged and it re-creates the directory.

Whether or not things ever stabilise depends on the details (e.g. from the code it looks like a third invocation of doesDirectoryExist would say that nothing changed).

Shake's linting feature checks that build outputs (i.e. rule results) do not change during the build. It would have caught this I think.

andreabedini · 2023-11-15T05:03:26Z

app/Foliage/GitClone.hs

+  run GitClone{repo} _old _mode = do
+    let path = cacheDir </> "git" </> gitHubRepoToString repo
+
+    alreadyCloned <- liftIO $ doesDirectoryExist path


Here you should use that _old parameter to store (inside shake itself) information from the rule previous run. From the docs:

As an example, a typical BuiltinRun will look like:

run key oldStore mode = do ... pure $ RunResult change newStore newValue

Where you have:

key, how to identify individual artifacts, e.g. with file names.

oldStore, the value stored in the database previously, e.g. the file modification time.

mode, either RunDependenciesSame (none of your dependencies changed, you can probably not rebuild) or RunDependenciesChanged (your dependencies changed, probably rebuild).

change, usually one of either ChangedNothing (no work was required) or ChangedRecomputeDiff (I reran the rule and it should be considered different).

newStore, the new value to store in the database, which will be passed in next time as oldStore.

newValue, the result that apply will return when asked for the given key.

The argument oldStore is Maybe ByteString and newStore has to be ByteString, oldStore is Nothing if it is the first time the rule runs.

E.g. you could use that 1) to detect whether you checkout out the repo before 2) perhaps remember what commit you had checkedout (not sure if it's worth it but you can do that).

Yeah I thought about using the last fetched revision in the store. It felt like an optimization though and not sure if needed. You think we should do that?

ch1bo · 2023-11-15T07:12:55Z

The run Action is re-executed every time the rule is invoked, but with the extra information on whether its dependencies have changed. The thing is that doesDirectoryExist triggers a built-in rule; which the GitClone rule will depend upon. This means that GitClone is changing the output of another rule, which is ... not a good thing.
[...]
Whether or not things ever stabilise depends on the details (e.g. from the code it looks like a third invocation of doesDirectoryExist would say that nothing changed).

Shake's linting feature checks that build outputs (i.e. rule results) do not change during the build. It would have caught this I think.

I think I understood and this is the case when using the shake version of doesDirectoryExist. However, as documented on shake's version of doesFileExist, one should use the normal version from directory if needed.

I can't see a problem with the current use of directory exists and fetching if already present. As you request a change, I would like to now.. what exactly do you think I should change?

andreabedini · 2023-11-15T08:02:31Z

I can't see a problem with the current use of directory exists and fetching if already present. As you request a change, I would like to now.. what exactly do you think I should change?

I meant using the shake store to decide if to re-run the fetch but that's ok if we don't do it now. Do you know why the CI is failing?

ch1bo · 2024-01-02T10:43:34Z

@andreabedini What do you want to do with the CI failures here? (It's obviously an impure test)

andreabedini · 2024-01-16T10:07:28Z

@andreabedini What do you want to do with the CI failures here? (It's obviously an impure test)

@ch1bo Ideally we should be able to have a submodule pointing to a local repository but we would need to support general git repositories first. I think the most pragmatic option is to run the tests with nix run. I just made changes to the flake so that you can do

- run: nix run .#foliage:test:foliage-test-suite

but foliage has to be added to the PATH.

ch1bo marked this pull request as draft November 3, 2023 18:13

ch1bo marked this pull request as ready for review November 6, 2023 17:22

andreabedini reviewed Nov 7, 2023

View reviewed changes

app/Foliage/PrepareSource.hs Outdated Show resolved Hide resolved

app/Foliage/PrepareSource.hs Outdated Show resolved Hide resolved

app/Foliage/Meta.hs Outdated Show resolved Hide resolved

andreabedini assigned andreabedini and unassigned andreabedini Nov 7, 2023

ch1bo force-pushed the submodule-support branch from 565989e to 6225a51 Compare November 13, 2023 14:57

ch1bo added 7 commits November 13, 2023 16:03

Write a test case using a test fixture repository

2943666

This test fails now because the tarball fetching of sources does not include files from the submodule (but the fixture references a file as a data-file)

Add a GitClone rule which fetches sources using git

ba799ae

Working copies are kept in the _cache/git/ directory.

Remove unused githubRepoTarballUrl helper

77d2454

Make gitCheckout just an Action, no additional rule

cae3ce4

This avoids some boiler plate and works just the same.

Re-add custom rule for GitClone and use git worktree to prepare source

ac429cc

This re-uses the git repository in the _cache/git/<user>/<repo> directory, but uses a temporary directory to get the worktree for a given rev to prepare the per-package directory in _cache/<package>.

Use untracked version of doesDirectoryExist in GitClone

745c03c

ch1bo force-pushed the submodule-support branch from 46b371d to 745c03c Compare November 13, 2023 15:03

Not use Show instance for GitHubRepo and Rev

ca51e1b

andreabedini requested changes Nov 15, 2023

View reviewed changes

andreabedini approved these changes Nov 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submodule support #89

Submodule support #89

ch1bo commented Nov 3, 2023 •

edited

Loading

ch1bo commented Nov 3, 2023

ch1bo commented Nov 3, 2023

andreabedini left a comment

michaelpj commented Nov 8, 2023

michaelpj commented Nov 8, 2023

ch1bo commented Nov 13, 2023

ch1bo commented Nov 13, 2023

andreabedini commented Nov 15, 2023

andreabedini left a comment •

edited

Loading

andreabedini Nov 15, 2023

ch1bo Nov 15, 2023

ch1bo commented Nov 15, 2023 •

edited

Loading

andreabedini commented Nov 15, 2023

ch1bo commented Jan 2, 2024

andreabedini commented Jan 16, 2024

Submodule support #89

Are you sure you want to change the base?

Submodule support #89

Conversation

ch1bo commented Nov 3, 2023 • edited Loading

ch1bo commented Nov 3, 2023

ch1bo commented Nov 3, 2023

andreabedini left a comment

Choose a reason for hiding this comment

michaelpj commented Nov 8, 2023

michaelpj commented Nov 8, 2023

ch1bo commented Nov 13, 2023

ch1bo commented Nov 13, 2023

andreabedini commented Nov 15, 2023

andreabedini left a comment • edited Loading

Choose a reason for hiding this comment

andreabedini Nov 15, 2023

Choose a reason for hiding this comment

ch1bo Nov 15, 2023

Choose a reason for hiding this comment

ch1bo commented Nov 15, 2023 • edited Loading

andreabedini commented Nov 15, 2023

ch1bo commented Jan 2, 2024

andreabedini commented Jan 16, 2024

ch1bo commented Nov 3, 2023 •

edited

Loading

andreabedini left a comment •

edited

Loading

ch1bo commented Nov 15, 2023 •

edited

Loading