-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git-annex
: ensure exported archives include git-annex data
#42
Comments
The status quo seems to be that gitea includes the symlinks in the tar.gz output, but not the annexed files they point to, while the zip output does not even contain the symlinks (which AFAIK would be supported though, but The latter regarding symlinks in zip files is true for GitHub as well. I didn't look into how GitHub or Gitea handle git-lfs standalone though, the git-annex repo I checked in GitHub uses git-lfs as a special remote. The In the case of tar we could just add the whole annex to the archive in a second step, but that would include files that are not present in the exported branch/tree-ish and the resulting archive provides a suboptimal experiences in my archive viewer, since I needed to unpack the archive first before the symlinks became usable. This would also not work for the current zip output, because of the missing symlinks. Therefore, the most straightforward approach I can think of is creating the zip or tar archive using An alternative would be the approach taken by DataLad's With this I still have two open questions:
What do you think about this? I could try implementing the "create archive with |
Oh hi! That's very cool. I would love to have some help on this "neurogitea" project!
That's very interesting. I haven't examined datalad's approach. But I am leaning towards using less components and less code if possible. If I had to ask people to add But I also am not even sure how gitea handles exports. I haven't even checked yet if it uses
When you do this, you need to remember that there are two kinds of annexed files: symlinks and pointer files. Pointer files are the default in repos made with git-annex v8. I have code that handles both cases in (though it could probably be tightened)
I'm also pretty new to Go! Working on gitea has been the most experience I've had with it. But it's not meant to be a difficult language to pick up so I think you'll probably be okay. So yes, please :) |
Hi there as well! My experience with using this project has been pretty great so far, so thanks for starting it.
Looks like it is done in https://github.com/neuropoly/gitea/blob/5149ad0fb20167a89b217e2e94fe9cc8da908fb9/modules/git/repo_archive.go#L52-L75, which does use In the case of .bundle files I think it happens in https://github.com/neuropoly/gitea/blob/5149ad0fb20167a89b217e2e94fe9cc8da908fb9/modules/git/repo.go#L271-L309. This looks a bit more complicated and we would need to investigate if there even is a sensible way to include annexed content there.
Thanks for the heads-up, I wasn't aware of that. Funnily enough this explains a recent confusion I had when trying to get a file out of git-annex. I tried unlocking and commiting the unlocked file, but the resulting non-symlink file was still shown as part of the annex. Turns out that made it into a pointer file and I'll try to come up with a PR and report back, it might take me a while though. |
Interesting! I'll be curious to compare notes. By the way I have some deployment scripts, in ansible, but I haven't published them to galaxy.ansible.com yet. I just need a nudge from knowing I have users to kick me into gear and actually put it out there.
Wonderful! Good luck and let me know if you need any help! |
Looks like this is not true, it's just that the gnome archive manager I was looking at didn't show them. Unzipping on the CLI with
We are running a publicly reachable instance of this Project (https://atris.fz-juelich.de/), as well as an internal one, at my place of work as part of an initiative to establish DataLad in our institute and area of research. That is very much early work-in-progress though. I am also maintaining a "fork" of this repo at https://jugit.fz-juelich.de/m.risse/gitea which contains a number of FZJ specific changes (mainly things like theming and configuration for external renderers for netCDF and grib files, which are common file types for us). I am using docker-compose to manage these sites, so I have no immediate need for your ansible deployment scripts. |
The "Releases" page lets you download an archive (from
git-archive
?). It would be handy if this worked with git-annex datasets too, since then we could share people a link like https://data.neuropoly.org/neuropoly/some-project/archive/1.0.0.zip and they could get a versioned the data without having to install or learn git-annex (with gitea,GET /neuropoly/some-project/archive/1.0.0.zip
downloads assome-project-1.0.0.zip
, preserving the version number, so long as the receiver uses a standard browser orcurl -LJO
).It's good if people do want to learn and use git-annex, but some applications -- imagine, deploying training sets to clusters, simple exploratory work, etc -- don't need the extra headache.
I'll note that we currently can't do this with e.g. https://github.com/spine-generic/data-multi-subject/releases/tag/r20230223; there, the download links give a small <1MB .zip containing mostly annex pointers.
Plan
The text was updated successfully, but these errors were encountered: