Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression_format = inherit #5586

Open
cgwalters opened this issue Jun 11, 2024 · 12 comments
Open

Add compression_format = inherit #5586

cgwalters opened this issue Jun 11, 2024 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue

Comments

@cgwalters
Copy link

Related to containers/common#2048

How about adding compression_format = inherit? The semantics would be that we default compression for new layers to previous base image layer by default. IOW, we default to e.g. zstd:chunked only if we detect at least one usage of it in a parent layer.

That would greatly reduce the "flag day" effect of this and allow individual image maintainers to opt in as they desire.

@mtrmac
Copy link
Contributor

mtrmac commented Jun 12, 2024

That’s not really in the c/image “copy” model — if the layers inputs to a copy are compressed, the copy (unless instructed otherwise) preserves the format of each layer; if the inputs are uncompressed, we make a choice, and that’s what the “compression format” option chooses.

And, in particular, all buildah outputs written to c/storage are inherently uncompressed. By the time a “push” happens, the distinction between original and new layers is ~lost, so this would need to be a Buildah feature (recording the format used in (one specific layer of?) the FROM image during a build and specifying it as the format to use during a later during a push; and consider buildah build+podman push).

[We do try to preserve the original manifest during pulls, so for a pulled image, we typically know what the original compression was. AFAICS that information is lost during a build. We might plausibly try to preserve the MIME types on this path … but I think adding a new field to the image object would be more direct and reliable.]


I see the argument that “if we are building of a zstd:chunked base image, we can ~safely assume that consumers of derived images can also consume zstd:chunked”… but I struggle a bit with who would actually turn this new logic on.

  • Early adopters who know that their target environment can use zstd:chunked want to turn on zstd:chunked for all built images, to benefit as much as possible
  • Producers of a widely-used image might want to explicitly say gzip even if a distribution chooses to default to a newer one
  • Users with no opinion should probably keep their distributions’ hard-coded default. In particular, “inherit” with non-distribution-tied base images exposes users to surprising format changes in unexpected times. Yes, distributions would probably time the switch along with a major release upgrade, but for non-distributions, it’s just a feature that can be switched at ~any time, e.g. when the build system of the non-distribution base image is updated. (And historically we’ve seen Docker just flip output formats of registry features on, at, from the outside, unpredictable times.)

It seems to me that “inherit” would help us keep containers-common the same across Fedora versions, but the cost of actually implementing “inherit” would be quite a bit larger than the effort saved.

@cgwalters
Copy link
Author

but I struggle a bit with who would actually turn this new logic on.

I think this would in practice replace the other values of compression_format - it'd become the new upstream default. The use case for turning on a specific compression format explicitly seems notably weaker after it exists since most container builds are derived; anyone who really wants it can opt in today with podman push --compression-format etc.

It seems to me that “inherit” would help us keep containers-common the same across Fedora versions, but the cost of actually implementing “inherit” would be quite a bit larger than the effort saved.

We already hit a problem where someone built a container on a Fedora 40 system that temporarily had zstd:chunked on by default, and pushed it to OCP where the ostree-container doesn't yet handle this (which we'll probably back port the fix, but still). Come Fedora 41 time those types of issues will still reoccur, though it's more likely that fixes have landed by then.

So the cost is things like that - also maybe old registry implementations?

But yes, it sounds like the code wouldn't be trivial, so...dunno, we can just keep this idea in our back pocket I guess and pull it out if we find wider fallout than just older ostree-container versions.

@mtrmac
Copy link
Contributor

mtrmac commented Jun 12, 2024

I’ll transfer it to buildah, where we have the connection between the base image and the build product.

@mtrmac mtrmac transferred this issue from containers/image Jun 12, 2024
@mtrmac mtrmac added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 12, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@cgwalters
Copy link
Author

Referenced this again in context of https://fedoraproject.org/wiki/Changes/zstd:chunked where I think this would still help a lot versus us trying to roll out a global flag to turn it on for all image pushing.

@nalind
Copy link
Member

nalind commented Jul 31, 2024

With "inherit", the compression used for an image that did not reuse layers from a base image would still need to be decided somehow.

At push-time, given that we write manifests for new images using uncompressed MediaType values (unless buildah's been told to compress it, something which has historically been strongly discouraged), the manifest can't be used as a source for that information.

At commit-time, buildah could examine the CompressionType that's recorded with the base image's layers to try to figure out how they were originally compressed, but since it might also have built that image, it would have to keep walking up the layers list until it found a value other than "uncompressed" to find a good answer. But then, this logic could be (and would probably need to be) deferred to push-time.

@nalind
Copy link
Member

nalind commented Jul 31, 2024

Wait, is the logic for compression_format only implemented in podman?

@cgwalters
Copy link
Author

With "inherit", the compression used for an image that did not reuse layers from a base image would still need to be decided somehow.

Yes, no argument there. But the set of folks building from-scratch images is relatively small and I think can be assumed to have some sophistication.

At push-time, given that we write manifests for new images using uncompressed MediaType values (unless buildah's been told to compress it, something which has historically been strongly discouraged), the manifest can't be used as a source for that information.

Right, but we could store this data in some lookaside location during the build.

There's also a somewhat related topic in that if we did that, we could also make this a build time property and not a push time property, which might align honestly with how a lot of people tend to think of these things. Yes, the actual generation of compressed layers may continue to happen on push...but this is an important property of an image, and one would tend to expend to find "important image properties" around the logic for where it's "built", not where it's pushed.

@mtrmac
Copy link
Contributor

mtrmac commented Jul 31, 2024

Thinking about the UI … there’s also the “publish variants with two compression algorithms” case, where there is a single c/storage image, a single on-registry destination name, but multiple destination formats. I suppose if we had a build-time “and compress this way when publishing” property, this should be also included (?!).

Copy link

A friendly reminder that this issue had no activity for 30 days.

Copy link

github-actions bot commented Oct 4, 2024

A friendly reminder that this issue had no activity for 30 days.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue
Projects
None yet
Development

No branches or pull requests

4 participants