Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(health-check): health checks added for each container #117

Closed
wants to merge 2 commits into from

Conversation

bilalcaliskan
Copy link

@bilalcaliskan bilalcaliskan commented Dec 10, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds liveness/readiness probes for each possible container in a pod. It will make us ensure that our deployments are healthy all the time.

Which issue(s) this PR fixes:
Fixes #57

Special notes for your reviewer:
I had to add tcpSocket health probes for signaller port(8090) because application normally opens a single port, without varnish-exporter:

root@kube-httpcache-0:/# netstat -lntpu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 :::8090                 :::*                    LISTEN      1/kube-httpcache

Also i have no idea how to publish new Helm chart https://helm.mittwald.de Helm repository. I could not examine deeply but i hope current Github actions handle that already.

@bilalcaliskan
Copy link
Author

bilalcaliskan commented Dec 12, 2021

@martin-helmich i do not know how to interact with bot to assign someone for review(if any), so could you assist me for the review process?

Regards.

@Shanuson
Copy link

We are currently implementing an custom version of this in our environment.
Using a readiness probe resulted in a problem when we only have one pod running, or installing the initial setup.
Initially the readiness probe will prevent the varnish pod from being listed by the svc as an endpoint, the go-lang script will therefore find no endpoints, so the varnish does not start and the readiness probe never becomes green.

If you already got an cluster running, doing a rolling update works because other pods are listed in the endpoint list.

To make this work, the golang script needs to be updated too so it includes always itself (own pod ip) as an endpoint.
So there will always be an endpoint in the list and varnish will start and the readiness probe will be fine.

initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 3
readinessProbe:
Copy link

@Shanuson Shanuson Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment about the effect of a readiness probe,
but I guess these here should work since they are not using the varnish itself?

@bilalcaliskan
Copy link
Author

bilalcaliskan commented Dec 20, 2021

@Shanuson let me clear about one thing. Readiness or liveness probe is not related with any of the objects like service or endpoint. Kubelet does the readiness or liveness probe with the ip of the pod itself. Am i wrong?

"To make this work, the golang script needs to be updated too so it includes always itself (own pod ip) as an endpoint.
So there will always be an endpoint in the list and varnish will start and the readiness probe will be fine."

This solution does not relevant to me because as i mentioned and as i know, kubelet does the readiness/liveness probe with the ip of the pod, it does not care about the service ip or the endpoints behind the service.

@bilalcaliskan
Copy link
Author

@Shanuson as a counter argument, i did not have any problem with that readiness/liveness probes.

@mittwald-machine
Copy link
Collaborator

There has not been any activity to this pull request in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

@martin-helmich
Copy link
Member

Sorry for the late reaction.

Regarding the "readiness vs. endpoint list" discussion: We've had that before, most recently here #49 (comment).

TL;DR: If you're using clustering (multiple Varnish servers, using the signaller component to dispatch merge requests), a readiness probe might break service discovery (which relies on a service selecting all kube-httpcache pods) and might break startup entirely, because the readiness of kube-httpcache depends on selecting at least one endpoint from its own cluster (meaning that the controller won't become "ready" until at least one controller is ready -- with the obvious conclusion that it'll wait infinitely for itself).

To mitigate, one might attempt using a Service resource with the .spec.publishNotReadyAddresses property.

@bilalcaliskan
Copy link
Author

hello @martin-helmich , i could not test the PR with multi instance so i could not see any problem. I got the real problem right now, so thank you for clear description. I will continue to work on the PR for a while.

Cheers!

@mittwald-machine
Copy link
Collaborator

There has not been any activity to this pull request in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend Helm chart with default liveness & readiness probes
4 participants