Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dailies for Ubuntu/focal #5587

Merged
merged 69 commits into from
Aug 7, 2024

Conversation

TheRealFalcon
Copy link
Member

new upstream snapshot and fix quilt patches for test_main.py and dropping env condition from service files.

leavelet and others added 30 commits July 3, 2024 14:10
The /etc/hostname.* files should have the mtu on
a separate line otherwise it gives error:

  ifconfig: mtu: bad value

The lines are executed in order by ifconfig and
mtu should be on it's own line.

Fixes: canonicalGH-5413
)

When there is no IPv6 addr given in the customization configuration,
we shall set IPv6 type to dhcp6, then customized Linux network will be
set to dhcp IPv6 explicitly.
* Remove `unittest` constructs and remove base classes.
* Replace tests that don't test things with tests that do
* Add fstab and mounts combinations test
There are no call sites requesting not decoding the environment vars.
This change decodes then always, simplifying typing and logic.
Instead of a broad try/except, do properly check for conditions that
invalidate a mount location.
Support for jumbo frames requires that the underlying physical interfaces
and the parent bond interface all have the larger MTU configured, not just
the physical interfaces.
…canonical#5501)

At least one of (or both) 'baseurl' or 'metalink' should be provided for yum
repository specification. Add schema changes to enforce it. Without this,
with just 'metalink' property set, one would get the schema validator error

\---
Error: Cloud config schema errors: yum_repos.epel-release: 'baseurl' is a required property
\---

Signed-off-by: Ani Sinha <[email protected]>
On systemd, services are started by PID 1. When this doesn't happen, cloud-init
is in an unknown run state and should warn the user.

Reorder pid log to be able to reuse Distro information.

Add docstring deprecating util.is_Linux().
…5209)

Implement verify_clean_boot() to ignore certain expected logs
in a platform-specific way.
…anonical#5209)

Ensure ignore_warnings=True or ignore_errors=True is honored and
not overridden by supplemental warning texts appended.
…l#5209)

Avoid using warning level messages as there may be some
use-cases in the wild that need to invoke cloud-init boot
stages after boot for some reason unknown to upstream.

Provide a detailed warning message informing admins to file
issues against cloud-init to better represent those feature
needs before dropping this feature altogether.
…onical#5504)

Commit 604d80e introduced assertions expecting exit 2 from the
CLI when calling cloud-init init --local. Revert this test assertion
as only cloud-init status command exits (2) on deprecations/warnings.

Invoking cloud-init's boot stages on the commmand line will only exit
1 if critical errors are encountered to avoid degrading overall
systemd health as seen from cloud-init systemd units. When cloud-init
boot stages encounter recoverable_errors of any type, there is no
need to exit non-zero as those deprecation logs are not-critical to
the health of the system as a whole.
It is pretty consistently failing due to canonical#5373 with no fix in
sight.
Ensure DNS server addresses are parsed from the proper location
of network_data.json

Fixes canonical#5386

Co-authored-by: Alberto Contreras <[email protected]>
If DNS information is added to a NetworkManager managed interface where
the given protocol family is disabled, NetworkManager will be unable to
activate the interface.

canonical#5387
'mirrorlist' config can be specified instead or along with 'baseurl' in the yum
repository config. Add support for specifying mirrorlist instead of 'baseurl'.

Fixes canonicalGH-5520
Signed-off-by: Ani Sinha <[email protected]>
9929a00 added the ability to used a cached datasource when none is
found. This was supposed to be per-datasource, but the lack of cache
cleaning got applied universally. This commit makes it so cache will be
cleaned as it was before if fallback isn't implemented in datasource.

Fixes canonicalGH-5486
This is useful for logs we want hidden by default but can be turned
on via configuration.
Switch to pathlib where appropriate and call consistently
…5515)

With this change, the following config in cloud.cfg.d/ will select NoCloud in
network stage.

```
datasource_list: [ GCE, NoCloud, None ]
datasource:
  NoCloud:
    seedfrom: http://0.0.0.0:8000/
```

Previously a two or less datasources in the datasource_list were required to
get this behavior, which was undocumented and not intuitive.

The ds-identify already allowed inline user-data and meta-data to
trigger detection.

Add ds-identify unittests for seedfrom and inline user-data.
Add DataSourceNoCloud.ds_detect() unittests for seedfrom and inline
user-data.
The nocloud datasource logs messages that are sometimes confused by users
for errors. Clarify them.

Also, remove redundant information from the logs:

- simplify log wording
- only include seed and dsmode information in nocloud string when
  non-default values are used
aciba90 and others added 21 commits July 25, 2024 11:21
…#5521)

Formally document providing runtime configuration in system configuration.
Introduce names to identify previously unnamed NoCloud concepts.
Add more structure - discrete sections for:
- runtime configuration types
- discovery configuration
- configuration sources
Add group of pages for drop-in custom modules and
restructure existing docs under it.

Add doc for custom datasources and config modules.
    
SC-1836
Fixes canonicalGH-4649
…nonical#5568)

Unspecified base match in labeler assumes 'any' for each match
clause. When specifying base-branch and --any-glob-to-any-file either
one of these cases would result in a successful match which would label
all PRs again main as documentation. We need to explicitly specify
'all:' in our labeler match config to ensure BOTH:

 * matching file paths related to documentation
         -AND-
 * targeting a merge against 'main' branch
…cal#5570)

Also drop undesirable former doc-autolabel.yml workflow
When instance id hasn't changed and datasource hasn't changed, don't
forcibly reload the configuration.
This enables support for network config v2 and v1 to NoCloud
when used with http / ftp / etc. 

BREAKING_CHANGE: Adds an additional network request to NoCloud.
String output changed in 7703634.
Instance-id doesn't change on LXD / Focal.
The handle function of cc_mounts was hard to grok and had one of the
highest cyclomatic complexity scores in the codebase. Functionally,
the code should be unchanged.
Python interpreter initialization and module import time 
contributes a significant amount of wall clock time to
cloud-init's runtime (and therefore to total boot time).

Cloud-init has four stages. Each stage starts its own Python
interpreter and loads the same libraries. To eliminate the
redundant work of starting an interpreter and loading libraries,
this changes cloud-init to run as a single process. Systemd
service ordering is retained by using the existing cloud-init
services as shims which use a synchronization protocol to start
each cloud-init stage and to communicate that each stage is
complete to the init system. Since multiple cloud-init processes
sit in the critical chain of starting the system, this reduces
boot time (including time to ssh login and time to cloud-init
completion).

Currently only systemd is supported, but the synchronization
protocol should be capable of supporting other init systems
as well with minor changes.

Note: This enables many additional follow-on improvements that
eliminate redundant work. However, these potential improvements
are temporarily ignored. This commit has been structured to
minimize the changes required to capture the majority of primary
performance savings while preserving correctness and the ability
to preserve backwards compatibility.

Since this changes the semantics of the existing cloud-init unit
files, this change takes the opportunity to rename one of its
systemd units which causes frequent user confusion. The unit named
cloud-init.service is often mistaken by users for being the only
cloud-init service, when it is simply one of four stages. This
stage is documented as the "network" stage, so this service will
be renamed to "cloud-init-network.service". A new notify service
is added as part of this implementation which contains the
cloud-init process. This unit is named "cloud-init-main.service".
 
Synchronization protocol
========================

- create one Unix socket for each systemd service stage
- send sd_notify()
- For each of the four stages (local, network, config, final):
   - when init system sends "start" to the Unix socket, start the
     stage
   - when running stage is complete, send "done" to Unix socket

File changes
============

socket.py (new)
---------------

- define a systemd-notify helper function
- define a context manager which implements a multi-socket
  synchronization protocol

cloud-init.service -> cloud-init-network.service (renamed)
----------------------------------------------------------

- renamed to cloud-network.service

cloud-{init-local,init-network,config,final}.services
-------------------------------------------

- change ExecStart to use netcat to connect to Unix socket and:
  - send a start message
  - wait for completion response
- note: a pure Python equivalent is possible for any downstreams
  which do not package openbsd's netcat

cloud-init-main.service (new)
-----------------------------

 - use service type to 'notify'
 - invoke cloud-init in single process mode
 - adopt systemd ordering requirements from cloud-init-local.service
 - adopt KillMode from cloud-final.service

main.py
-------

 - Add command line flag to indicate "all stages" mode
 - In this mode run each stage followed by an IPC
   synchronization protocol step

cloud-final.services
--------------------

- drop KillMode

cloud-init-local.services
-------------------------

- drop dependencies made redundant by ordering after
  cloud-init-main.service

Performance Impact
==================

On Ubuntu 24.04, Python's wall clock start up time as measured with
`time python3 -c 'import cloudinit.cmd.main' on a few cloud types:

lxc container: 0.256s
QEMU machine:  0.300s
gce instance:  0.367s
ec2 instance:  0.491s

This change eliminates x1 this start up time from time to ssh.
This change eliminates x3 this start up time from cloud-init's total
completion. Total benefit varies based on the platform that the
instance is hosted by, but all platforms will measurably benefit from
this change.

BREAKING_CHANGE: Run all four cloud-init services as a single systemd service.
Commit 5322dca introduced an assumption to read_seeded that
network-config must always be present for NoCloud datasource.
Since it is still considered and optional supplemental configuration
allow the read_seeed calls to succeed in the absence of network-config.

Avoids failures seen in tests/integration-tests/datasources/test_nocloud.py::
  test_nocloud_seedfrom_vendordata
This was previously unnecessary because:

a. LXD automatically appends the user.meta-data key to default meta-data.
b. In the presence of duplicate keys, PyYAML uses the last key.

This change is the cloud-init part of a set of changes that will enable cloud-init
to avoid depending on undefined behavior. In the future LXD may stop
appending user-defined meta-data to its default meta-data. This change
makes cloud-init forward compatible to LXD for when that change is
implemented.

canonicalGH-5575
@TheRealFalcon TheRealFalcon changed the title Ubuntu/focal Fix dailies for Ubuntu/focal Aug 6, 2024
@@ -8,7 +8,6 @@ Description=cloud-init hotplug hook sock
After=cloud-config.target
--- a/systemd/cloud-init-main.service.tmpl
+++ b/systemd/cloud-init-main.service.tmpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we want to be modifying this service file on focal. I plan on submitting a follow-up PR which will patch it out completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@holmanb holmanb self-assigned this Aug 6, 2024
[Service]
Type=oneshot
--- a/systemd/cloud-init.service.tmpl
+++ b/systemd/cloud-init.service.tmpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealFalcon This actually drops the patch from the re-named cloud-init.service.tmpl (now named cloud-init-network.service.tmpl). I can add it back in in my followup PR if you want - otherwise we should rename it and I'll rename it back. I'm fine with doing that either way - let me know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we're kind if in a weird "post single process PR" but "pre patch it out" state. I think I'm fine leaving it as-is as you'll need to make changes in your patch either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we good to merge this PR then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it!

@TheRealFalcon TheRealFalcon merged commit 3ebfb85 into canonical:ubuntu/focal Aug 7, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.