Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd boot order #5755

Closed
benj-n opened this issue Sep 30, 2024 · 8 comments
Closed

Systemd boot order #5755

benj-n opened this issue Sep 30, 2024 · 8 comments
Labels
bug Something isn't working correctly new An issue that still needs triage

Comments

@benj-n
Copy link

benj-n commented Sep 30, 2024

Bug report

Hi,

While I was building an ArchLinux package based on cloud-init 24.3.1, I found that cloud-init-main.service was never started at boot. Trying to understand what was happening I found this in the service template:

https://github.com/canonical/cloud-init/blob/24.3.1/systemd/cloud-init-main.service.tmpl#L22-L30

First, there a double definition of:

Before=sysinit.target
Conflicts=shutdown.target 

These lines appear in a if statement and outside.

Second, on some OS (I was working on Arch, but it may not be limited to it), the Before=sysinit.target conflicts with Wants=network-pre.target a few lines above (Sysinit.target happens before network-pre.target).

The more I look at it, the more I think the Before=sysinit.targetand Conflicts=shutdown.targetoutside the if statement were not meant to be merged. It could be nice to have @holmanb thoughts on this.

Environment details

  • Cloud-init version: >=24.3.
  • Operating System Distribution: Discovered on ArchLinux but shouldn't be relevant
  • Cloud provider, platform or installer type: Discovered on Cloudstack but shouldn't be relevant
@benj-n benj-n added bug Something isn't working correctly new An issue that still needs triage labels Sep 30, 2024
@holmanb
Copy link
Member

holmanb commented Sep 30, 2024

Second, on some OS (I was working on Arch, but it may not be limited to it), the Before=sysinit.target conflicts with Wants=network-pre.target a few lines above (Sysinit.target happens before network-pre.target).

I don't think that this explains the issue. Wants= does not affect service order.

Can you please include output from the following commands on a system where this service didn't start?

$ systemctl list-jobs --after
$ systemctl --failed
$ dmesg -T | grep -i -e warning -e error -e fatal -e exception
$ journalctl -xe

@benj-n
Copy link
Author

benj-n commented Oct 2, 2024

Hi @holmanb

As requested:

$ systemctl list-jobs --after
No jobs running.
$ systemctl --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION

0 loaded units listed.
$ dmesg -T | grep -i -e warning -e error -e fatal -e exception
[Wed Oct  2 21:37:31 2024] RAS: Correctable Errors collector initialized.
[Wed Oct  2 21:37:31 2024] GPT: Use GNU Parted to correct GPT errors.
[Wed Oct  2 21:37:35 2024] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

The journactl output has been filtered to show the ordering cycle issue, and the subsequent failures of the other services (as no sockets are present):

$ journalctl -xe

Oct 02 21:37:33 template systemd[1]: sysinit.target: Found ordering cycle on cloud-init-main.service/start
Oct 02 21:37:33 template systemd[1]: sysinit.target: Found dependency on basic.target/start
Oct 02 21:37:33 template systemd[1]: sysinit.target: Found dependency on sockets.target/start
Oct 02 21:37:33 template systemd[1]: sysinit.target: Found dependency on [email protected]/start
Oct 02 21:37:33 template systemd[1]: sysinit.target: Found dependency on sysinit.target/start
Oct 02 21:37:33 template systemd[1]: sysinit.target: Job cloud-init-main.service/start deleted to break ordering cycle starting with sysinit.target/start

Oct 02 21:37:35 template sh[375]: netcat: /run/cloud-init/share/local-return.sock: No such file or directory
Oct 02 21:37:35 template systemd[1]: Finished Cloud-init: Local Stage (pre-network).

Oct 02 21:37:36 template sh[397]: netcat: /run/cloud-init/share/network-return.sock: No such file or directory
Oct 02 21:37:36 template systemd[1]: Finished Cloud-init: Network Stage.

Oct 02 21:37:36 template sh[404]: netcat: /run/cloud-init/share/config-return.sock: No such file or directory
Oct 02 21:37:36 template systemd[1]: Finished Cloud-init: Config Stage.

Oct 02 21:37:36 template sh[409]: netcat: /run/cloud-init/share/final-return.sock: No such file or directory
Oct 02 21:37:36 template systemd[1]: Finished Cloud-init: Final Stage.
$ systemd-analyze verify cloud-init-main.service
cloud-init-main.service: Found ordering cycle on sysinit.target/start
cloud-init-main.service: Found dependency on cloud-init-main.service/start
cloud-init-main.service: Unable to break cycle starting with cloud-init-main.service/start
Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.
cloud-init-main.service: Failed to create cloud-init-main.service/start: Transaction order is cyclic. See system logs for details.

Please note that, cloud-init-main.service starts as expected, as soon as I remove the Before=sysinit.target in /usr/lib/systemd/system/cloud-init-main.service:

# cat /usr/lib/systemd/system/cloud-init-main.service
# systemd ordering resources
# ==========================
# https://systemd.io/NETWORK_ONLINE/
# https://docs.cloud-init.io/en/latest/explanation/boot.html
# https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
# https://www.freedesktop.org/software/systemd/man/latest/systemd.special.html
# https://www.freedesktop.org/software/systemd/man/latest/systemd-remount-fs.service.html
[Unit]
Description=Cloud-init: Single Process
Wants=network-pre.target

After=systemd-remount-fs.service
Before=sysinit.target
Before=cloud-init-local.service
Conflicts=shutdown.target
RequiresMountsFor=/var/lib/cloud
ConditionPathExists=!/etc/cloud/cloud-init.disabled
ConditionKernelCommandLine=!cloud-init=disabled
ConditionEnvironment=!KERNEL_CMDLINE=cloud-init=disabled

[Service]
Type=notify
ExecStart=/usr/bin/cloud-init --all-stages
KillMode=process
TasksMax=infinity
TimeoutStartSec=infinity

# Output needs to appear in instance console output
StandardOutput=journal+console

[Install]
WantedBy=cloud-init.target

I am wondering if it is intended that the Before=sysinit.target is at the same time inside and outside the if condition here: https://github.com/canonical/cloud-init/blob/24.3.1/systemd/cloud-init-main.service.tmpl#L22-L30

@holmanb
Copy link
Member

holmanb commented Oct 3, 2024

Thanks @benj-n, thanks for reporting and diagnosing. Yes it seems the recent single process change probably allowed Before=sysinit.target into the general scope, which also requires DefaultDependencies=no, otherwise a cycle is introduced.

Could you please share your build process? I'd like to reproduce this and investigate to validate the fix. A reproducer would be appreciated.

Long term we should strive to reduce/eliminate this template complexity and seek an order which satisfies the needs of all/most distros. In theory systemd provides the mechanics required for a unified upstream service ordering (i.e. service ordering on non-existent services is a no-op). There may be some edge cases, but I think that we can simplify, and possibly even eliminate the templating for some, if not all of the services. In the meantime, since this change appears to have been unintentional, we should probably remove the accidental change.

@benj-n
Copy link
Author

benj-n commented Oct 4, 2024

Hi @holmanb

The build process:

Arch Linux was freshly installed from 2024.10 ISO.
archinstall --skip-version-check was run with all default options (including best effort auto-partitioning with EXT4). Only a custom root password was set.

The post install config in chroot included:

  • pacman -Syu dhclient openssh
  • systemctl enable sshd
  • basic tweaking in /etc/ssh/sshd_config for convenient login

Once rebooted on disk, the package itself was built with the following process

  • install cloud-init package dependencies and convenient tools:
pacman --noconfirm -Syu vi git netplan python-build python-installer python-setuptools python-wheel python-httpretty python-passlib python-pytest python-pytest-mock python-responses openbsd-netcat dhclient python-configobj python-jinja python-jsonpatch python-jsonschema python-oauthlib python-pyserial python-typing_extensions
cat > PKGBUILD <<'EOF'
pkgname=cloud-init
pkgver=24.3.1
pkgrel=0
pkgdesc="Cloud instance initialization"
arch=(any)
url="https://cloud-init.io"
license=('GPL-3.0-only OR Apache-2.0')
depends=(
  bash
  dhclient
  openbsd-netcat
  python
  python-configobj
  python-jinja
  python-jsonpatch
  python-jsonschema
  python-netifaces
  python-oauthlib
  python-pyserial
  python-pyyaml
  python-requests
  python-typing_extensions
  sudo
  systemd
)
makedepends=(
  netplan
  python-build
  python-installer
  python-setuptools
  python-wheel
)
checkdepends=(
  procps-ng
  python-httpretty
  python-passlib
  python-pytest
  python-pytest-mock
  python-responses
)
optdepends=(
  'cloud-guest-utils: for growpart'
  'netplan: for configuring network using netplan'
  'python-passlib: for Azure and BSD support'
  'python-urllib3: for LXD and Scaleway data sources'
)
backup=(
  etc/cloud/cloud.cfg
  etc/cloud/cloud.cfg.d/05_logging.cfg
)
source=(
  https://github.com/canonical/cloud-init/archive/$pkgver/$pkgname-$pkgver.tar.gz
  https://patch-diff.githubusercontent.com/raw/canonical/cloud-init/pull/5696.patch
)
sha512sums=('01b798d67328ecd66229568233fb674f45c055ac469adb31a55a909b6b2c8fd1901a833accb66423923b8945210aa4dc6a0d61945787aabe414c01b501b1416d'
'3dacf5ebc122ffc48b5a4a695ff49737dfe7e680955f260ffea83349eb451d7ee9b5d1f458790246f7c42fe9467151da2c4e522441f01cf3e2ef43b904e31755')
b2sums=('7e4cb8bd65d34d08b4b4e5ea2370ac952e05b3a210b91a9b29d8e4b633246a9520c2d9259aedfe8edded0d7d761808b86b6b19d98309633c981b2eb0e7cf1f93'
'0a993d1f0910541f88b28eb5abd9dd8147997da8a7e7226a700696ffe5208fe7d5310dc632d1e7c92e5d1ef7946ceddc997606e2233a04179dda61987c1457e0')

prepare() {
  patch -Np1 -d $pkgname-$pkgver -i ../5696.patch
}

build() {
  cd $pkgname-$pkgver
  python -m build --wheel --no-isolation -C--distro=arch -C--prefix=/usr
}

check() {
  local pytest_options=(
   -vv
    # we don't ship /etc/ca-certificates.conf
    --deselect tests/unittests/config/test_cc_ca_certs.py::TestRemoveDefaultCaCerts::test_commands
    --deselect tests/unittests/test_ds_identify.py::TestWSL::test_empty_cloudinitdir
    --deselect tests/unittests/test_ds_identify.py::TestWSL::test_found_via_userdata
    --deselect tests/unittests/config/test_schema.py::TestNetplanValidateNetworkSchema::test_network_config_schema_validation_false_when_skipped
    --deselect 'tests/unittests/config/test_schema.py::TestNetworkSchema::test_network_schema[net_v2_complex_example]'
    --deselect 'tests/unittests/config/test_schema.py::TestNetworkSchema::test_network_schema[net_v2_invalid_config]'
    --deselect 'tests/unittests/config/test_schema.py::TestNetworkSchema::test_network_schema[net_v2_skipped]'
  )
  cd $pkgname-$pkgver
  pytest "${pytest_options[@]}"
}

package() {
  local _file
  local site_packages=$(python -c "import site; print(site.getsitepackages()[0])")

  cd $pkgname-$pkgver
  python -m installer --destdir="$pkgdir" dist/*.whl
  # NOTE: due to limitations with PEP517, files are installed to site-packages,
  # not to the correct global locations (e.g. /etc and /usr), so we remove them
  # and install them manually below.
  rm -frv "$pkgdir/$site_packages/"{usr,etc}

  # configuration and hooks
  install -vDm 644 config/cloud.cfg.d/* -t "$pkgdir/etc/cloud/cloud.cfg.d/"
  ./tools/render-template --variant arch ./config/cloud.cfg.tmpl "$pkgdir/etc/cloud/cloud.cfg"
  install -vDm 644 templates/*.tmpl -t "$pkgdir/etc/cloud/templates/"
  install -vDm 755 tools/{ds-identify,hook-hotplug,uncloud-init,write-ssh-key-fingerprints} -t "$pkgdir/usr/lib/$pkgname/"
  # documentation, man pages and shell completion
  install -vDm 644 doc/*.txt -t "$pkgdir/usr/share/doc/$pkgname/"
  install -vDm 644 doc/examples/*.txt -t "$pkgdir/usr/share/doc/$pkgname/examples/"
  install -vDm 644 doc/examples/seed/* -t "$pkgdir/usr/share/doc/$pkgname/examples/seed/"
  install -vDm 644 doc/man/*.1 -t "$pkgdir/usr/share/man/man1/"
  install -vDm 655 bash_completion/$pkgname -t "$pkgdir/usr/share/bash-completion/completions/"
  # udev rules
  install -vDm 644 udev/*.rules -t "$pkgdir/usr/lib/udev/rules.d/"
  # systemd integration
  install -vdm 755 "$pkgdir/usr/lib/systemd/system"{,-generators}
  for _file in cloud-{config,final,init-local,init-main,init-network}.service; do
    ./tools/render-template --variant arch ./systemd/$_file.tmpl "$pkgdir/usr/lib/systemd/system/$_file"
  done
  install -vDm 644 systemd/*.{service,socket,target} "$pkgdir/usr/lib/systemd/system/"
  ./tools/render-template --variant arch ./systemd/cloud-init-generator.tmpl "$pkgdir/usr/lib/systemd/system-generators/cloud-init-generator"
  chmod 755 "$pkgdir/usr/lib/systemd/system-generators/cloud-init-generator"
  install -vDm 644 systemd/disable-sshd-keygen-if-cloud-init-active.conf -t "$pkgdir/usr/lib/systemd/system/[email protected]/"
}
EOF
  • build package
runuser -unobody makepkg
  • install package
pacman -U /tmp/cloud-init-24.3.1-0-any.pkg.tar.zst
  • check failure with $ systemd-analyze verify cloud-init-main.service
sysinit.target: Found ordering cycle on cloud-init-main.service/start
sysinit.target: Found dependency on sysinit.target/start
sysinit.target: Unable to break cycle starting with sysinit.target/start
Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.
cloud-init-main.service: Failed to create cloud-init-main.service/start: Transaction order is cyclic. See system logs for details.

@min-xu-et
Copy link

Thank you so much for the debugging and I ran into this today after upgrading and use the new image on a fresh AWS instance. I ran into the nc.openbsd dependency issue as well.

@holmanb
Copy link
Member

holmanb commented Oct 12, 2024

Thanks @benj-n for identifying and debugging the issue, and @min-xu-et for reporting. I just submitted #5819 to resolve this issue.

@min-xu-et
Copy link

Thank you @holmanb ! Will/have the nc.openbsd issue be addressed as well?

@holmanb
Copy link
Member

holmanb commented Oct 12, 2024

Thank you @holmanb ! Will/have the nc.openbsd issue be addressed as well?

@min-xu-et see #5696, I think that this has already been addressed.

holmanb added a commit to holmanb/cloud-init that referenced this issue Oct 15, 2024
Since After=sysinit.target is a default dependency, a cycle is
introduced when Before=sysinit.target and not DefaultDependencies=no.

Fixes canonicalGH-5755
holmanb added a commit to holmanb/cloud-init that referenced this issue Oct 22, 2024
Since After=sysinit.target is a default dependency, a cycle is
introduced when Before=sysinit.target and not DefaultDependencies=no.

Fixes canonicalGH-5755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly new An issue that still needs triage
Projects
None yet
Development

No branches or pull requests

3 participants