Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root on ZFS: warning for GRUB incompatibility with bpool snapshots #464

Closed
wants to merge 3 commits into from
Closed

Root on ZFS: warning for GRUB incompatibility with bpool snapshots #464

wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Oct 23, 2023

@ghost
Copy link
Author

ghost commented Oct 23, 2023

Also added a warning against encrypted send/recv. Seems to be quite serious, see for example this spreadsheet by maintainers of ZFSBootMenu.

Yǔchēn Guō 郭宇琛 added 3 commits October 23, 2023 22:34
Root on ZFS: warning againt encrypted send/recv, due to crashes

Signed-off-by: Yǔchēn Guō 郭宇琛 <[email protected]>
Signed-off-by: Yǔchēn Guō 郭宇琛 <[email protected]>
@ghost
Copy link
Author

ghost commented Oct 24, 2023

@rincebrain You may not care about Root on ZFS at all, but I still would like to ask what should we, the Root on ZFS users, do, re: your comment at this link.

Recently, due to your comment and this GRUB issue, I have come to recognize each and every ZFS feature as suspicious which should not be enabled unless there is a real need and there is no other alternative. Does this view point make any sense?
Should we take note of this when writing tutorials?

I'm currently writing guides to replace ZFS native encryption with LUKS. LUKS seems to be more stable and better maintained.

@ghost
Copy link
Author

ghost commented Oct 24, 2023

In other words: is defaulting to zpool create -o compatibility=legacy on every pool, a good idea?

@rincebrain
Copy link
Contributor

rincebrain commented Oct 24, 2023

Depends how much you worry about your data being intact.

The maintainers do their best, but fundamentally, every new change has some risk of having bugs introduced - the more invasive, the more likely, for some definition of each. If you have backups, or aren't keeping any data that's that bad to lose, then sure, run the bleeding edge and shrug. The more you want to be careful about not losing your data, the longer I would suggest you test things in an environment where the data isn't that dangerous to lose before upgrading or enabling a new setup.

I usually recommend that people set up with the defaults in general and then only vary from it as needed for their environment, including only enabling features you deliberately need versus using zpool upgrade to turn everything on that has been added since you made the pool (or, with compatibility=, up to that limit). For bootable setups, you would usually use a bpool with something like compatibility=grub2 since they don't really support newer features, to avoid breaking things there, but for a root pool with a boot pool, then you don't need it for that.

All that to say, I don't think I'd suggest compatibility=legacy as a default. Maybe the current OpenZFS major release, if you like, but I don't think advising people to pick a specific feature set (or "none") as a baseline is going to help matters in any specific way. Encryption is still a buggy mess, but I don't think pinning things to 0.7 would be a particularly great outcome to avoid that, for example.

You could try to synthesize manually a feature list that you thought were stable enough and keep it updated, but that seems fraught.

(Oh, and that spreadsheet is mine.)

@ghost
Copy link
Author

ghost commented Oct 24, 2023 via email

@rincebrain
Copy link
Contributor

I'm familiar with formal verification, thank you.

I would be surprised if a PR that recommended turning off every feature flag was accepted, if I'm honest. "Don't use any features" isn't really a workable approach, and since most of the testing is going to be on pools with those features enabled, there's a nontrivial risk incurred as well the further you deviate from what's been tested well - so the "everything off" option isn't a global maximum of minimized risk for data loss, it's a tradeoff of risk profiles, like everything else.

I personally don't think anything but native encryption has been fraught enough that I would discourage its use by default, except hole_birth, but that's basically a noop unless someone drops the tunable turning it off again.

If you want to recommend compatibility=legacy with GRUB, be my guest, the feature list for a pool holding /boot is pretty academic. But I wouldn't really encourage just turning off all ZFS feature flags "for safety" - it's not strictly safer, it's just trading one risk for another.

Everything is going to be a sliding scale of risks in various dimensions. Sometimes people report bugs against a weird edge case that came up in RHEL because the particular permutation of cherrypicked kernel features hadn't been tested, particularly on older RHEL releases where the delta from mainline Linux can grow quite large. Sometimes people discover that a particular edge case like specific conditions in memory or IO pressure like openzfs/openzfs#15439 didn't come up in their testing but breaks quite easily for some other people's workloads.

If you disable very user-visible features like zstd support or encryption or BRT by default, that's going to cause an increase in complaints about that not working and not knowing why, as well as the risk of strange edge cases where one thing is on and not another. I would personally suggest that running the default set of feature flags enabled, and just waiting a bit to update each time to see if anything in the common cases that wasn't found in testing somehow gets reported, is probably a better tradeoff than recommending people pick a less well tested path to run through.

@ghost
Copy link
Author

ghost commented Oct 24, 2023 via email

@ghost
Copy link
Author

ghost commented Oct 24, 2023 via email

@ghost ghost marked this pull request as draft October 24, 2023 13:40
@rincebrain
Copy link
Contributor

zvols are a kind of dataset - it'd be nice if we had a term for just "filesystem+volume" because the term also covers snapshots and bookmarks, but here we are.

If grub breaks with a snapshot on the root, that should probably be a straightforward fix, since I don't think it used to, though I don't know the codebase.

As far as I know, zvols are actively used by a number of entities of varying sizes - I don't do this for a day job at this point, and personally, I don't have much use for zvols, but they've worked pretty well when I've used them, and the PRs adding things like blk-mq support and improving the behavior interactions with the "quota" of volsize seem to corroborate people using them actively.

@ahesford
Copy link

  1. The spreadsheet linked here doesn't come from ZFSBootMenu, but we are aware of it.
  2. The Internet is littered with people who have been bitten by GRUB stupidity with ZFS. I won't advocate strongly for ZFSBootMenu in these guides---we have our own guides for several distributions if people want to go that route---but I would advocate against trying to shoehorn a ZFS installation into GRUB compatibility. If you aren't going to use a bootloader that will just work with your pool as you want it, you're better off just leaving /boot off of ZFS entirely. Make an ext4 filesystem or put /boot on your EFI system partition.

@ghost
Copy link
Author

ghost commented Oct 24, 2023 via email

@ahesford
Copy link

  1. I don't know what "a custom-built kernel, distributed over the internet", refers to. ZFSBootMenu doesn't customize any kernel. When built locally from a user's system, it takes the user's kernel and ZFS modules. When people download pre-built release assets, which we provide as a convenience, they use a stock Void Linux kernel and its stock ZFS kernel module built via DKMS.

  2. "Initrd integrity" isn't really all that important, because an initramfs image can be trivially rebuilt. In fact, your distribution does it all the time after a kernel upgrade.

    To be clear, I wholly advocate putting /boot on ZFS---but done the right way, on the same pool (in fact, the same filesystem) that hosts your operating system to begin with. Most of the troubles with GRUB and ZFS come from one of two massive failures: lack of support for modern ZFS features, or a loss of synchronization between whatever is in /boot and whatever lives on the main system (in particular, /lib/modules). The first problem is manifest by the army of people on r/zfs and other online forums who have screwed themselves by inadvertently upgrading a pool. The second is manifest by the massive failure that was zsys, which attempted (poorly) to solve the inherent synchronization problem.

    However, if you don't want to put your faith in a bootloader that just works with your pool, there are other ways to get high availability that are less error-prone than the fragile approach demanded by GRUB's limited ZFS support.

  3. Your assertion that "without multi-disk, many of ZFS's integrity protections amount[] to nothing" is nonsense. ZFS offers incredible value with respect to data integrity and availability even on a single disk.

  4. None of your multi-disk bootloader approach has anything to do with ZFS or, for that matter, GRUB. It amounts to "mirror your EFI system partition and, when necessary, replicate the boot sector on multiple disks". This can be applied to any bootloader and doesn't mitigate GRUB's problems with ZFS and the fragile configurations built around it. In fact, if you're using EFI booting, high availability of ZFSBootMenu is even easier, because you can dump it on any EFI system partition, including on any number of cheap USB drives that you can keep in your pocket for emergencies.

@ghost
Copy link
Author

ghost commented Oct 25, 2023 via email

@ahesford
Copy link

I have no interest in maintaining extra guides on OpenZFS sites. We maintain guides at docs.zfsbootmenu.org that describe what we consider best practices for ZFS on root, with the purpose of facilitating deployment of ZFSBootMenu. We maintain editorial control of those guides.

If people want to amend the community-contributed OpenZFS guides to refer to our instructions, that is between those proposing the change and those with the authority to approve it. We don't care either way.

@ghost
Copy link
Author

ghost commented Oct 26, 2023 via email

@ghost
Copy link
Author

ghost commented Oct 26, 2023

I'll close this and reopen as seperate PRs.

@ghost ghost closed this Oct 26, 2023
@ghost ghost deleted the dev branch October 26, 2023 08:17
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants