-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd:chunked issues #509
Comments
Ugh. Fun...thanks for finding and debugging this. |
It's actually really embarrassing that this wasn't caught by our CI, needs fixing |
Actually wait this is the |
And yes, we need to add bootc test gating to containers-common and skopeo pretty soon. |
hmm interesting, earlier in the week this was happening regardless of which base image I used. Just went to verify that and now this bug only happens with the |
The last build of containers-common on the podman-next copr was an automatic rebuild of the rawhide sources from sometime back. I disabled this automatic rebuild after we got rawhide to a sane-enough state. Let me know if you need an update to the fedora or copr rpm. I can do a one-off build. We're currently working on a packit workflow from upstream c/common to downstream containers-common rpm, like we have for podman and the rest, with automatic builds going to podman-next right after every upstream commit to main. I'm hoping that change will land early next week. |
so this works now using any base image. I'm not sure what changed. I guess something in the base images or in quay.io? |
Hi @cgwalters. We encountered this issue in QE CI environment many times in two days.
|
EDIT: No, I was wrong, And yes, the fact that there is no gating CI in any of
That covers the ostree-container path let this all sail right through. |
OH!!!. Since yesterday (Saturday), I can't run container inside |
Hmm, at this very moment So...hum, this must somehow relate to the host environment version. Ah yes, if we look at the logs from that test run I can see that inside the fedora cloud AMI we have 'Installed: containers-common-5:0.59.1-1.fc40.noarch'`.
EDIT: See above, I'm no longer confident the relevant change here was in containers-common. |
I'm trying to reproduce this locally initially by hacking up my podman-machine environment, but no luck yet. Another thing that actually changed pretty recently too is there's a new podman: https://bodhi.fedoraproject.org/updates/FEDORA-2024-ab42dd0ffb |
Sure. I'll run that tomorrow. |
@cgwalters I re-run the test with an old Fedora 40 runner, tests passed. I checked the log. I found the difference is the latest |
I've added comments to https://bodhi.fedoraproject.org/updates/FEDORA-2024-ab42dd0ffb and I think that's the root cause is that image builds started defaulting to being zstd:chunked. I still need to dig in and see if that's what's causing the "remote error: expected 69427364 bytes in blob, got 72333312" but I'd bet so. |
i think this is two fold issue -- but the end user impact is only see if you have a btrfs containers-storage. in my testing with a digitalocean f39 system which uses btrfs.
furthermore, the simple act of pulling a personally built bootc images on a f39 (or f40 or rawhide) to a system that uses btrfs as the containers-storage will cause the machine to wedge/freeze up when you have small resources 1c/1g. adding swap to the system prevented the freezing, but didn't produce a more reliable / predictable 'podman pull' behavior. if you run the podman pull in a loop 10 times it keeps on attempting to re-sync data. making the suggested change to /usr/share/containers/storage.conf of enable_partial_images = "false" allowed for both a predictable 'podman pull' and a bootc install to-existing-root to succeed when graphDriverName: btrfs once bootc is running the underlying containers-storage reverts to overlay. |
@hanulec Is your input image in |
the image i built had the newest items from my containerfile be added in zstd format. i needed to use skopeo inspect to see this. the image was built on a default config from a fresh rawhide image (version: 41.20240530.0) root@bootc: |
and the more i look / re-test -- its the podman push action that is changing the MIMEType from "application/vnd.oci.image.layer.v1.tar" to either "+gzip" or "+zstd" |
@cgwalters centos-bootc c10s |
I only recently realized on this issue why this may be happening. When I was testing ostreedev/ostree-rs-ext#622 I did it via a registry. But this bug is about "bootc install" where we're pulling from containers-storage: (unpacked representation) and as part of that we ask it to regenerate a tarball from the unpacked files, and by design today that tarball must be bit-for-bit compatible with the descriptor. It would not surprise me at all if there were corner cases where that breaks today. Inherently this "copy from c/storage" model is going through a different codepath than what is used by podman for skopeo today where it drives the copying. The whole "synthesize a temporary tarball" is really lame of course, what we want instead is containers/storage#1849 |
I probably hit containers/podman#22813 and I've modified the runtime container.conf as a workaround. |
Based on some recent discussion it sounds like this one should gain some more priority. I think a think we need to do here is add an integration test that covers a zstd:chunked image (and for good measure we should probably also include LBIs built with zstd:chunked). |
I'll add LBIs testing with zstd:chunked soon, meanwhile, we need to fix the error described in this issue, because the test will fail when I revert gzip to the default zstd:chunked on c10s https://artifacts.osci.redhat.com/testing-farm/8d024c92-874a-4228-bf35-be69080b6fde/ |
Right for
There's a few very recent fixes for zstd:chunked issues in c/storage and c/image, one that I think might be related is containers/storage#2130 |
xref commit 1ad44cb "rpm/update-config-files: zstd:chunked not enabled in Fedora yet" Basically it doesn't make sense to keep this enabled in RHEL10 but not in Fedora, that *seriously* undermines the testing story. My immediate practical issue is that zstd:chunked in RHEL10 as of right now still breaks the bootc path, xref: containers/bootc#509 (comment) Signed-off-by: Colin Walters <[email protected]>
containers/common#2213 will back out the zstd:chunked default for rhel10, and it was just backed out in all fedora versions. That said, we still want to do background work to test with it, both:
|
…1.0.182 build(deps): bump serde from 1.0.179 to 1.0.182
This took awhile to track down. I'm going to continue investigating but I wanted to document what I've found so far.
The failure happens when attempting a
bootc install to-disk
using an image built from a base image with at least one extra layer, e.g.If the image is built locally
bootc install to-disk
works correctly. The failure happens when pushing the image to a repo (only tested with quay.io), clearing out the image from local storage viapodman system prune --all
, then runningbootc install to-disk
. Here's example output of the failure:So, the OpenImage call to the skopeo proxy is failing.
The latest version of containers-common found in Fedora39/40 repos sets
pull_options.enable_partial_images=true
in/usr/share/containers/storage.conf
. This is the change that started causing this error. Togglingenable_partial_images
to false resolves the error. I'm not familiar enough with this stack to know the root cause of this yet. I'll continue digging but I'm sure someone else would be able to track this down a lot quicker if you think it's urgent.The text was updated successfully, but these errors were encountered: