Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: only skip bootstrap when we have certs #289

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

evans915
Copy link
Contributor

@evans915 evans915 commented Sep 30, 2024

Name of feature: Fixing bootstrapper skip logic

Pain or issue this feature alleviates:

user@NODE:~$ kubectl logs -n namespace service-b577877c9-mfl7v autocert-renewer
error reading certificate chain: : no such file or directory

user@NODE:~$ kubectl logs -n namespace service-b577877c9-mfl7v autocert-bootstrapper
Found existing /var/run/autocert.step.sm/root.crt, skipping bootstrap

Why is this important to the project (if not answered above):

Because after a node drain running kubernetes deployments, when the node schedules pods again the root cert seems to exist, but the leaf certificate and key does not. This ends up causing pods to become stuck in CrashLoopBackOff

Is there documentation on how to use this feature? If so, where?

N/A

In what environments or workflows is this feature supported?

Kubernetes clusters

In what environments or workflows is this feature explicitly NOT supported (if any)?

N/A

Supporting links/other PRs/issues:

In response to this PR:
#174

💔Thank you!

@github-actions github-actions bot added the needs triage Waiting for discussion / prioritization by team label Sep 30, 2024
@evans915 evans915 marked this pull request as ready for review September 30, 2024 15:40
@bdelvecchio
Copy link

We've been trying to reproduce this case for a while, and our current best guess is that step certificate fails to obtain a cert for some reason (maybe autocert is unavailable, or a transient DNS error), the script does not exit, but fetches and writes the root.crt. So we need to change the "already completed" test from looking for a root.crt (which does not at all prove success) to the site.crt.

The part I personally don't understand is: how is the init container being run a second time without the emptyDir being cleared?

We're seeing this in k8s 1.27.12.

@hslatman hslatman assigned hslatman and maraino and unassigned hslatman Oct 8, 2024
@maraino maraino self-requested a review October 8, 2024 18:10
Copy link
Collaborator

@maraino maraino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this would happen, but the fix makes sense.

@maraino maraino merged commit 2e5e593 into smallstep:master Oct 8, 2024
15 checks passed
@maraino
Copy link
Collaborator

maraino commented Oct 8, 2024

Thanks @evans915, the CA is building new images with the tag v0.19.7

@bdelvecchio
Copy link

I've been updating Kubernetes 1.27 nodes, and seeing this error in almost all daemonset pods which are not drained before reboot. Restarting these pods restores autocert to working order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Waiting for discussion / prioritization by team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants