Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Single process optimization (canonical#5489)
Python interpreter initialization and module import time contributes a significant amount of wall clock time to cloud-init's runtime (and therefore to total boot time). Cloud-init has four stages. Each stage starts its own Python interpreter and loads the same libraries. To eliminate the redundant work of starting an interpreter and loading libraries, this changes cloud-init to run as a single process. Systemd service ordering is retained by using the existing cloud-init services as shims which use a synchronization protocol to start each cloud-init stage and to communicate that each stage is complete to the init system. Since multiple cloud-init processes sit in the critical chain of starting the system, this reduces boot time (including time to ssh login and time to cloud-init completion). Currently only systemd is supported, but the synchronization protocol should be capable of supporting other init systems as well with minor changes. Note: This enables many additional follow-on improvements that eliminate redundant work. However, these potential improvements are temporarily ignored. This commit has been structured to minimize the changes required to capture the majority of primary performance savings while preserving correctness and the ability to preserve backwards compatibility. Since this changes the semantics of the existing cloud-init unit files, this change takes the opportunity to rename one of its systemd units which causes frequent user confusion. The unit named cloud-init.service is often mistaken by users for being the only cloud-init service, when it is simply one of four stages. This stage is documented as the "network" stage, so this service will be renamed to "cloud-init-network.service". A new notify service is added as part of this implementation which contains the cloud-init process. This unit is named "cloud-init-main.service". Synchronization protocol ======================== - create one Unix socket for each systemd service stage - send sd_notify() - For each of the four stages (local, network, config, final): - when init system sends "start" to the Unix socket, start the stage - when running stage is complete, send "done" to Unix socket File changes ============ socket.py (new) --------------- - define a systemd-notify helper function - define a context manager which implements a multi-socket synchronization protocol cloud-init.service -> cloud-init-network.service (renamed) ---------------------------------------------------------- - renamed to cloud-network.service cloud-{init-local,init-network,config,final}.services ------------------------------------------- - change ExecStart to use netcat to connect to Unix socket and: - send a start message - wait for completion response - note: a pure Python equivalent is possible for any downstreams which do not package openbsd's netcat cloud-init-main.service (new) ----------------------------- - use service type to 'notify' - invoke cloud-init in single process mode - adopt systemd ordering requirements from cloud-init-local.service - adopt KillMode from cloud-final.service main.py ------- - Add command line flag to indicate "all stages" mode - In this mode run each stage followed by an IPC synchronization protocol step cloud-final.services -------------------- - drop KillMode cloud-init-local.services ------------------------- - drop dependencies made redundant by ordering after cloud-init-main.service Performance Impact ================== On Ubuntu 24.04, Python's wall clock start up time as measured with `time python3 -c 'import cloudinit.cmd.main' on a few cloud types: lxc container: 0.256s QEMU machine: 0.300s gce instance: 0.367s ec2 instance: 0.491s This change eliminates x1 this start up time from time to ssh. This change eliminates x3 this start up time from cloud-init's total completion. Total benefit varies based on the platform that the instance is hosted by, but all platforms will measurably benefit from this change. BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
- Loading branch information