-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submodule support #89
base: main
Are you sure you want to change the base?
Conversation
Ping @andreabedini and @michaelpj as you created the related issues and I can't assign you as reviewers. |
@andreabedini I would need your feedback on how we should structure this test differently / run it differently as it is obviously not a "pure test" which can run on the Hydra CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the lead on this @ch1bo!
I have always considered this tricking to get right so I have many worries and only few suggestions :D
One worry is that it could be fragile, can we recover from a bad clone? what if the directory is there but empty?
Another one is performance, we end up fetching the whole history while we only need a single commit. Perhaps we can copy what GitHub's actions/cache
does https://github.com/input-output-hk/foliage/actions/runs/6774254718/job/18410918251#step:2:26.
We need to take into account concurrent accesses to the same repository. Shake has a concept of Resource
to limit concurrent access but I am not sure whether it fits our purpose since our "resources" are dynamic (newResource
returns Rules Resource
not Action Resource
, and I am not sure whether it is safe to use liftIO newResourceIO
).
The test failure seems to be genuine and not depend on nix (althought the terrible reporting does depend on nix, and I will have to fix it). I could reproduce locally
git submodules: Cloning into '_cache/git/cardano-scaling/foliage-test-with-submodule'...
Submodule 'cardanonical' (https://github.com/CardanoSolutions/cardanonical) registered for path 'cardanonical'
Cloning into '/home/andrea/work/foliage/tests/fixtures/git-submodule.QEFqZ7/_cache/git/cardano-scaling/foliage-test-with-submodule/cardanonical'...
Note: switching to 'db5874494ee5bac3fa8fee07d5806fcec27a2f4e'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at db58744 Add cardanonical as submodule
fatal: destination path '_cache/git/cardano-scaling/foliage-test-with-submodule' already exists and is not an empty directory.
foliage: Error when running Shake build system:
at want, called at app/Foliage/CmdBuild.hs:48:7 in main:Foliage.CmdBuild
* Depends on: buildAction
at apply1, called at app/Foliage/PrepareSource.hs:39:31 in main:Foliage.PrepareSource
* Depends on: prepareSource foliage-test-with-submodule-1.1.0 PackageVersionSpec {packageVersionTimestamp = Just 2023-11-03 15:53:59 UTC, packageVersionSource = GitHubSource {githubRepo = cardano-scaling/foliage-test-with-submodule, githubRev = 1.1.0, subdir = Nothing}, packageVersionRevisions = [], packageVersionDeprecations = [], packageVersionForce = False}
at command_, called at app/Foliage/PrepareSource.hs:127:10 in main:Foliage.PrepareSource
* Raised the exception:
Development.Shake.command_, system command failed
Command line: git clone --recursive https://github.com/cardano-scaling/foliage-test-with-submodule.git _cache/git/cardano-scaling/foliage-test-with-submodule
Exit code: 128
Stderr:
fatal: destination path '_cache/git/cardano-scaling/foliage-test-with-submodule' already exists and is not an empty directory.
FAIL (2.18s)
Building repository (2.18s)
readCreateProcess: foliage build (exit 1): failed
Use -p '/git submodules/' to rerun this test only.
accepts --no-signatures: OK (0.03s)
Building repository (0.03s)
Running checks
accepts --write-metadata: OK (0.04s)
Building repository (0.03s)
Running checks
1 out of 4 tests failed (2.29s)
Test suite foliage-test-suite: FAIL
Test suite logged to:
/home/andrea/work/foliage/dist-newstyle/build/x86_64-linux/ghc-9.4.7/foliage-0.6.0.0/t/foliage-test-suite/test/foliage-0.6.0.0-foliage-test-suite.log
0 of 1 test suites (0 of 1 test cases) passed.
Error: cabal: Tests failed for test:foliage-test-suite from foliage-0.6.0.0.
Most likely this is due doesDirectoryExist
. Shake makes the code look imperative but nothing gets run right away, instead it creates a build graph and execute only the parts than need to be updated. Long story short, the documentation says to not call doesFileExist
and doesDirectoryExist
on files and directory created by the build system.
Perhaps it would be ok to hide this machinery behind IO so that shake does not see it. I am not sure to be honest.
I think this potentially saves us a lot of time downloading tarballs, also, since we can just clone a repository once instead of downloading a tarball for every package version ever released from that repository. 100% agree about the concurrency issue. We run foliage with lots of concurrency usually, which is very helpful, we need to cope with the git repo being stateful. Here's a suggestion:
That way there is only ever one rule that touches the main repository, and it's safe to run it many times. We can run all the "checkout the source for a commit" in parallel, I think? and then after that we're safe since the sources are all independent. |
Just to be clear: I think this is actually a feature.
FWIW the definition of |
I had the same idea but only knew about |
565989e
to
6225a51
Compare
This works for me now. @michaelpj The @andreabedini I am not very familiar with how shake rules work. It seems though that, if called like this, the rule is re-executed even if the directory exists. Hence the call to |
This test fails now because the tarball fetching of sources does not include files from the submodule (but the fixture references a file as a data-file)
Working copies are kept in the _cache/git/ directory.
This adds another version of the same package fetched via git. The same working copy from _cache/git/<owner>/<repo> is re-used in both PrepareSource steps (which hopefully do not conflict) and copied onto individual _cache/<package>/<version> directories. The second package version in the git-submodule fixture _sources also uses a tag name as rev to highlight this is possible as well.
This avoids some boiler plate and works just the same.
This re-uses the git repository in the _cache/git/<user>/<repo> directory, but uses a temporary directory to get the worktree for a given rev to prepare the per-package directory in _cache/<package>.
46b371d
to
745c03c
Compare
@ch1bo Thank you for working on this. I am been reworking the way Foliage is structured to make better use of shake (which I feel I only just understood, finally 😂). I will take care of rebasing your PR on top of my rework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreabedini I am not very familiar with how shake rules work. It seems though that, if called like this, the rule is re-executed even if the directory exists. Hence the call to
doesDirectoryExist
andgit fetch
to make sure we have latest information. I switched to using the "untracked" version of this command which should be in-line to what is documented.
The run
Action
is re-executed every time the rule is invoked, but with the extra information on whether its dependencies have changed. The thing is that doesDirectoryExist
triggers a built-in rule; which the GitClone
rule will depend upon. This means that GitClone
is changing the output of another rule, which is ... not a good thing.
E.g.
First run:
doesDirectoryExist
is invoked, directory does not exist, it flagsRunDependenciesChanged
and returns false.GitClone
is called withRunDependenciesChanged
and it creates the directory.
Second run:
doesDirectoryExist
is invoked, directory now does exist, it flagsRunDependenciesChanged
and returns true.GitClone
is called withRunDependenciesChanged
and it re-creates the directory.
Whether or not things ever stabilise depends on the details (e.g. from the code it looks like a third invocation of doesDirectoryExist
would say that nothing changed).
Shake's linting feature checks that build outputs (i.e. rule results) do not change during the build. It would have caught this I think.
run GitClone{repo} _old _mode = do | ||
let path = cacheDir </> "git" </> gitHubRepoToString repo | ||
|
||
alreadyCloned <- liftIO $ doesDirectoryExist path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you should use that _old
parameter to store (inside shake itself) information from the rule previous run. From the docs:
As an example, a typical BuiltinRun will look like:
run key oldStore mode = do ... pure $ RunResult change newStore newValue
Where you have:
- key, how to identify individual artifacts, e.g. with file names.
- oldStore, the value stored in the database previously, e.g. the file modification time.
- mode, either RunDependenciesSame (none of your dependencies changed, you can probably not rebuild) or RunDependenciesChanged (your dependencies changed, probably rebuild).
- change, usually one of either ChangedNothing (no work was required) or ChangedRecomputeDiff (I reran the rule and it should be considered different).
- newStore, the new value to store in the database, which will be passed in next time as oldStore.
- newValue, the result that apply will return when asked for the given key.
The argument oldStore
is Maybe ByteString
and newStore
has to be ByteString
, oldStore
is Nothing
if it is the first time the rule runs.
E.g. you could use that 1) to detect whether you checkout out the repo before 2) perhaps remember what commit you had checkedout (not sure if it's worth it but you can do that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about using the last fetched revision in the store. It felt like an optimization though and not sure if needed. You think we should do that?
I think I understood and this is the case when using the shake version of I can't see a problem with the current use of directory exists and fetching if already present. As you request a change, I would like to now.. what exactly do you think I should change? |
I meant using the shake store to decide if to re-run the fetch but that's ok if we don't do it now. Do you know why the CI is failing? |
@andreabedini What do you want to do with the CI failures here? (It's obviously an impure test) |
@ch1bo Ideally we should be able to have a submodule pointing to a local repository but we would need to support general git repositories first. I think the most pragmatic option is to run the tests with
but foliage has to be added to the PATH. |
Fixes #88 by using the idea of #80.
🍂 Adds a test fixture which uses this remote github repository as source: https://github.com/cardano-scaling/foliage-test-with-submodule
🍂 Changes
PrepareSource
action to checkout a working copy into_cache/git/
if needed, fetches and prepares individual package directories in_cache/<package>/<version>/
forGitHubSource