feat(health-check): health checks added for each container #117

bilalcaliskan · 2021-12-10T18:27:25Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds liveness/readiness probes for each possible container in a pod. It will make us ensure that our deployments are healthy all the time.

Which issue(s) this PR fixes:
Fixes #57

Special notes for your reviewer:
I had to add tcpSocket health probes for signaller port(8090) because application normally opens a single port, without varnish-exporter:

root@kube-httpcache-0:/# netstat -lntpu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 :::8090                 :::*                    LISTEN      1/kube-httpcache

Also i have no idea how to publish new Helm chart https://helm.mittwald.de Helm repository. I could not examine deeply but i hope current Github actions handle that already.

bilalcaliskan · 2021-12-12T21:27:49Z

@martin-helmich i do not know how to interact with bot to assign someone for review(if any), so could you assist me for the review process?

Regards.

Shanuson · 2021-12-20T11:17:44Z

We are currently implementing an custom version of this in our environment.
Using a readiness probe resulted in a problem when we only have one pod running, or installing the initial setup.
Initially the readiness probe will prevent the varnish pod from being listed by the svc as an endpoint, the go-lang script will therefore find no endpoints, so the varnish does not start and the readiness probe never becomes green.

If you already got an cluster running, doing a rolling update works because other pods are listed in the endpoint list.

To make this work, the golang script needs to be updated too so it includes always itself (own pod ip) as an endpoint.
So there will always be an endpoint in the list and varnish will start and the readiness probe will be fine.

Shanuson · 2021-12-20T11:27:15Z

chart/values.yaml

+    initialDelaySeconds: 20
+    periodSeconds: 10
+    timeoutSeconds: 3
+  readinessProbe:


see my comment about the effect of a readiness probe,
but I guess these here should work since they are not using the varnish itself?

bilalcaliskan · 2021-12-20T19:11:52Z

@Shanuson let me clear about one thing. Readiness or liveness probe is not related with any of the objects like service or endpoint. Kubelet does the readiness or liveness probe with the ip of the pod itself. Am i wrong?

"To make this work, the golang script needs to be updated too so it includes always itself (own pod ip) as an endpoint.
So there will always be an endpoint in the list and varnish will start and the readiness probe will be fine."

This solution does not relevant to me because as i mentioned and as i know, kubelet does the readiness/liveness probe with the ip of the pod, it does not care about the service ip or the endpoints behind the service.

bilalcaliskan · 2021-12-20T19:14:47Z

@Shanuson as a counter argument, i did not have any problem with that readiness/liveness probes.

mittwald-machine · 2022-01-04T02:20:15Z

There has not been any activity to this pull request in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

martin-helmich · 2022-01-05T08:47:05Z

Sorry for the late reaction.

Regarding the "readiness vs. endpoint list" discussion: We've had that before, most recently here #49 (comment).

TL;DR: If you're using clustering (multiple Varnish servers, using the signaller component to dispatch merge requests), a readiness probe might break service discovery (which relies on a service selecting all kube-httpcache pods) and might break startup entirely, because the readiness of kube-httpcache depends on selecting at least one endpoint from its own cluster (meaning that the controller won't become "ready" until at least one controller is ready -- with the obvious conclusion that it'll wait infinitely for itself).

To mitigate, one might attempt using a Service resource with the .spec.publishNotReadyAddresses property.

bilalcaliskan · 2022-01-10T18:12:25Z

hello @martin-helmich , i could not test the PR with multi instance so i could not see any problem. I got the real problem right now, so thank you for clear description. I will continue to work on the PR for a while.

Cheers!

mittwald-machine · 2022-01-25T02:13:42Z

There has not been any activity to this pull request in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

feat(health-check): health checks added for each container

b878840

Shanuson reviewed Dec 20, 2021

View reviewed changes

mittwald-machine added the stale label Jan 4, 2022

martin-helmich removed the stale label Jan 5, 2022

Merge branch 'mittwald:master' into master

72dc220

mittwald-machine added the stale label Jan 25, 2022

mittwald-machine closed this Feb 2, 2022

bilalcaliskan mentioned this pull request Feb 2, 2022

*WIP* feat(health-check): health checks added for each container #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(health-check): health checks added for each container #117

feat(health-check): health checks added for each container #117

bilalcaliskan commented Dec 10, 2021 •

edited

Loading

bilalcaliskan commented Dec 12, 2021 •

edited

Loading

Shanuson commented Dec 20, 2021

Shanuson Dec 20, 2021 •

edited

Loading

bilalcaliskan commented Dec 20, 2021 •

edited

Loading

bilalcaliskan commented Dec 20, 2021

mittwald-machine commented Jan 4, 2022

martin-helmich commented Jan 5, 2022

bilalcaliskan commented Jan 10, 2022

mittwald-machine commented Jan 25, 2022

feat(health-check): health checks added for each container #117

feat(health-check): health checks added for each container #117

Conversation

bilalcaliskan commented Dec 10, 2021 • edited Loading

bilalcaliskan commented Dec 12, 2021 • edited Loading

Shanuson commented Dec 20, 2021

Shanuson Dec 20, 2021 • edited Loading

Choose a reason for hiding this comment

bilalcaliskan commented Dec 20, 2021 • edited Loading

bilalcaliskan commented Dec 20, 2021

mittwald-machine commented Jan 4, 2022

martin-helmich commented Jan 5, 2022

bilalcaliskan commented Jan 10, 2022

mittwald-machine commented Jan 25, 2022

bilalcaliskan commented Dec 10, 2021 •

edited

Loading

bilalcaliskan commented Dec 12, 2021 •

edited

Loading

Shanuson Dec 20, 2021 •

edited

Loading

bilalcaliskan commented Dec 20, 2021 •

edited

Loading