Consider using hardlink instead of cow for preparing `~/.lima/<INSTANCE>/basedisk` from the downloader cache #2818

AkihiroSuda · 2024-10-28T18:32:20Z

As the basedisk remains immutable, it could be just hardlinked from the downloader cache (when the cache exists on the same filesystem).

This may make the situation complicated though, if we want to support squashing the diffdisk to basedisk in future.

The text was updated successfully, but these errors were encountered:

afbjorklund · 2024-10-29T08:04:19Z

You could use reflinks, where available.

AkihiroSuda · 2024-10-29T11:01:15Z

You could use reflinks, where available.

We have been already using reflinks when available, but hardlinks would be preferable when reflinks are not supported by the filesystem.

lima/pkg/downloader/downloader.go

Line 426 in f4e50c2

return fs.CopyFile(dstPath, srcPath)

jandubois · 2024-10-29T15:39:09Z

COW feels slightly safer because it guarantees that the source is immutable. Hardlinks rely on there being no bugs or future changes that could modify the source.

afbjorklund · 2024-11-02T08:39:35Z

I think we would need to introduce CAS*, since currently the images are just indexed by their URL?

- location: "https://cloud-images.ubuntu.com/releases/24.10/release/ubuntu-24.10-server-cloudimg-amd64.img"

Currently I think that it will work with "prune", since the cache file will be deleted (before recreated).

But that means that the instance will end up "owning" the old file which might not be ideal either?

What I meant was that it should be possible for instances to continue to exist, after a cache cleanup.

But ultimately it would need some kind of "garbage collect" to clean up also those entires, if unused.

* With a CAS solution, then there would be a secondary layer for the data - like "by-content-sha256".

download/by-url-sha256/3b6b67faf5fd451e96832cbcaf6f5e04704d2ff7c47e749663508fc2a636130f/data ->
download/by-content-sha256/fad101d50b06b26590cf30542349f9e9d3041ad7929e3bc3531c81ec27f2c788.data

Probably overengineered here, but could help with mirrors and other distribution systems like IPFS?

i.e. main reason we introduced this in another project, was to be able to keep metadata in memory

By keeping the big files in separate storage ("rados"), all of the small files could be indexed ("redis")...

Like so: https://juicefs.com/docs/community/architecture

Where RADOS would be Ceph, and REDIS would be Valkey

afbjorklund · 2024-11-03T09:07:45Z

if we want to support squashing the diffdisk to basedisk in future

Why do we want to avoid that? I thought the clone took care of backing?

afbjorklund · 2024-11-03T09:10:12Z

I am assuming that the basic idea here is to keep each instance directory smaller, which sounds like a good idea.

And to avoid the time it takes to copy from the cache to the instance (assuming that it is in the right format...)?

norio-nomura · 2024-11-15T03:30:34Z

I think using clonefile(2) would be a good approach for macOS specifically.
Reference article: https://eclecticlight.co/2024/03/20/apfs-files-and-clones/

AkihiroSuda · 2024-11-15T03:36:04Z

Yes, we have been already using clonefile(2) via continuity
https://github.com/containerd/continuity/blob/v0.4.5/fs/copy_darwin.go#L27

norio-nomura · 2024-11-15T03:38:52Z

I am currently investigating whether clonefile(2) can be used for suspend/resume in #2900.

AkihiroSuda added kind/refactoring Refactoring area/performance component/qemu QEMU component/vz labels Oct 28, 2024

AkihiroSuda added the priority/low label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using hardlink instead of cow for preparing `~/.lima/<INSTANCE>/basedisk` from the downloader cache #2818

Consider using hardlink instead of cow for preparing `~/.lima/<INSTANCE>/basedisk` from the downloader cache #2818

AkihiroSuda commented Oct 28, 2024

afbjorklund commented Oct 29, 2024

AkihiroSuda commented Oct 29, 2024 •

edited

Loading

jandubois commented Oct 29, 2024

afbjorklund commented Nov 2, 2024 •

edited

Loading

afbjorklund commented Nov 3, 2024

afbjorklund commented Nov 3, 2024

norio-nomura commented Nov 15, 2024

AkihiroSuda commented Nov 15, 2024

norio-nomura commented Nov 15, 2024

Consider using hardlink instead of cow for preparing ~/.lima/<INSTANCE>/basedisk from the downloader cache #2818

Consider using hardlink instead of cow for preparing ~/.lima/<INSTANCE>/basedisk from the downloader cache #2818

Comments

AkihiroSuda commented Oct 28, 2024

afbjorklund commented Oct 29, 2024

AkihiroSuda commented Oct 29, 2024 • edited Loading

jandubois commented Oct 29, 2024

afbjorklund commented Nov 2, 2024 • edited Loading

afbjorklund commented Nov 3, 2024

afbjorklund commented Nov 3, 2024

norio-nomura commented Nov 15, 2024

AkihiroSuda commented Nov 15, 2024

norio-nomura commented Nov 15, 2024

Consider using hardlink instead of cow for preparing `~/.lima/<INSTANCE>/basedisk` from the downloader cache #2818

Consider using hardlink instead of cow for preparing `~/.lima/<INSTANCE>/basedisk` from the downloader cache #2818

AkihiroSuda commented Oct 29, 2024 •

edited

Loading

afbjorklund commented Nov 2, 2024 •

edited

Loading