-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joining additional controller node with dynamic config enabled wipes existing clusterconfig #4702
Comments
Does 1.29.6 count as "latest"? I really have no idea if I should have that box checked here |
It's important to understand that not everything is synced when using dynamic config. You still have to handle certain node-specific parts of the k0s config manually. Additional user-managed stacks in the manifest folders are expected to be synchronized manually, as well.
That's... surprising. How did you verify? AFAIK there's no code in k0s which would delete ClusterConfig resources. What could have happened is that the second controller has overwritten the ClusterConfig of the first one? 🤔
How were the Helm charts installed? Via manifests in
Did you check the logs of the new controller? Are there some files in |
Running
Via a k0s config in
I can spin up a new pair of instances and get any logs you'd like me to, this has reproduced consistently for me so far |
join-files.tar.gz join-files has the main-files has the same logs, the Please let me know if there's any more info you'd like here! |
A couple of observations/thoughts:
I didn't have any issues provisioning a cluster on Rocky 9.4 using k0sctl.yamlapiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: k0s-cluster
spec:
k0s:
version: 1.29.6+k0s.0
dynamicConfig: true
config:
spec:
network:
provider: calico
podCIDR: 10.10.0.0/16
serviceCIDR: 10.11.0.0/16
nodeLocalLoadBalancing:
enabled: true
extensions:
helm:
charts:
- chartname: okgolove/goldpinger
name: goldpinger
namespace: goldpinger
version: 6.1.2
repositories:
- name: okgolove
url: https://okgolove.github.io/helm-charts/
hosts:
- role: controller+worker
installFlags:
- --debug
uploadBinary: true
ssh:
address: 10.128.0.104
keyPath: ssh-private-key.pem
port: 22
user: k0s
- role: controller+worker
installFlags:
- --debug
uploadBinary: true
ssh:
address: 10.128.0.105
keyPath: ssh-private-key.pem
port: 22
user: k0s Can you maybe try to recreate the cluster using the same k0s.yaml, based on the An obligatory note: Never try to use an etcd cluster with two members. Always use an odd number (1, 3, ...) 🙃 |
Agreed with NLLB, I believe we use that normally and I just forgot to enable it here.
What is the appropriate method to specify this when joining a new controller node? Join, edit config file, start?
I didn't restart both controllers, but I can try
Gotcha, thanks!
I'll give it a try.
Obviously, but getting to three from one requires going through two 😄 |
Yes, copy over the k0s.yaml to the second controller, then join it using the token and the config file. In theory you should only need |
The issue is marked as stale since no activity has been recorded in 30 days |
The issue is marked as stale since no activity has been recorded in 30 days |
The issue is marked as stale since no activity has been recorded in 30 days |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.29.6+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
I created a cluster with calico networking, custom CIDRs, and a minimal helm chart extension to demonstrate the problem (goldpinger).
k0s was installed with
sudo /usr/local/bin/k0s install controller -c k0s.yaml --enable-dynamic-config --enable-worker --no-taints
.After starting and waiting for things to stabilize, a clusterconfig
k0s
existed in kube-system and the following pods were running:I then created a controller token and joined an additional node with
sudo /usr/local/bin/k0s install controller --token-file token.txt --enable-worker --no-taints --enable-dynamic-config
.After the additional controller joined, there is no longer a clusterconfig present, and only the following pods are running:
Goldpinger, coredns, and metrics-server are all no longer present.
Steps to reproduce
Expected behavior
The node joined as an additional controller and the cluster config was unchanged.
Actual behavior
The node joined as an additional controller, but the clusterconfig was removed / dynamic config was disabled, existing helm charts were removed, and metrics-server/coredns are no longer running.
Screenshots and logs
No response
Additional context
`k0s config`
The SANs on the certificate used by the joining node are also wrong (demonstrated with
openssl s_client -connect 10.11.0.1:443 </dev/null 2>/dev/null | openssl x509 -inform pem -text | grep -A1 "Subject Alternative Name"
), but that will be a different issue.I have also tested this with 1.30.2, and it has not reproduced there in my testing. I also believe this to be OS-dependent, as I was able to reproduce with Rocky 9 but not Ubuntu.
The text was updated successfully, but these errors were encountered: