Bump kube-prometheus-stack to 45.5.0 #4017

gdemonet · 2023-03-07T11:30:24Z

Update the charts with:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
rm -rf charts/kube-prometheus-stack
helm fetch -d charts --untar prometheus-community/kube-prometheus-stack

Also bump a number of components:

Prometheus to 2.42.0
Thanos to 0.30.2
Grafana to 9.3.8
kiwigrid/k8s-sidecar to 1.22.3
kube-state-metrics to 2.8.0
node-exporter to 1.5.0
prometheus-operator to 0.63.0

The chart.sls was re-rendered with:

./doit.sh codegen:chart_kube-prometheus-stack

Since we bumped Thanos, we also re-render its own chart (without
updating, since it did not change since the last update) with:

./doit.sh codegen:chart_thanos

Important note: Alertmanager configuration was updated by hand,
and a note was added to try and remind maintainers of doing it in the
future. This should help making the "InfoInhibitor" alert more useful.

bert-e · 2023-03-07T11:30:27Z

Hello gdemonet,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

bert-e · 2023-03-07T11:30:34Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

TeddyAndrieux

Really really hard to review

Just by looking at the changes on the salt side it looks good to me (except for the upgrade handling)

TeddyAndrieux · 2023-03-07T12:05:02Z

charts/kube-prometheus-stack.yaml

-      repository: '__image__(alertmanager)'
+      registry: '__var__(repo.registry_endpoint)'
+      repository: '__image_no_reg__(alertmanager)'


Sad, but ok, yes it's needed

But cannot we set this registry only once in global ?

Right will do, found out too late about this one 😇

Argh no I can't do that because the kube-state-metrics subchart doesn't use this value correctly:

image: "{{ .Values.global.imageRegistry | default .Values.image.repository }}:{{ .Values.image.tag | ...

😭

I'll submit a PR over there to have it fixed

charts/kube-prometheus-stack/README.md

CHANGELOG.md

Some charts now expect the image registry to be defined separately from the repository, and enforces that these values are joined with a slash. This causes issues with our macro "build_image_name", which builds the whole path. We add an option to this macro to omit the registry endpoint, and make this value available to rendered charts using the charts/render.py script header.

bert-e · 2023-03-08T11:05:44Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

gdemonet · 2023-03-08T17:49:44Z

charts/kube-prometheus-stack/values.yaml

+      # Drop cgroup metrics with no pod.
+      - sourceLabels: [id, pod]
+        action: drop
+        regex: '.+;'


This causes the current prometheus-adapter resourceRules.(cpu|memory).nodeQuery to not hit anything, because we dropped the metrics for nodes (pod="", id="/").

It appears the goal is to now rely on more efficient node-exporter metrics (see kubernetes-sigs/prometheus-adapter#516 and prometheus-community/helm-charts#2827), but we're now hitting an issue with labels: our node-exporter metrics don't have a "node" label, and that's not good when trying to map to /metrics.k8s.io/v1beta1/nodes.

We will need to fix this label issue (TBH, would be much simpler to explore and query, e.g. from our UI), but I'm not sure how involved this will be. For now, I'm considering a temporary workaround by keeping these metrics around, with a follow-up ticket.

Here's the follow-up #4018

And the workaround: 97ce7c6

gdemonet · 2023-03-08T21:11:23Z

/approve

bert-e · 2023-03-08T21:11:30Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

The following options are set: approve

salt/metalk8s/addons/prometheus-operator/pre-upgrade.sls

TeddyAndrieux

salt/metalk8s/addons/prometheus-operator/deployed/init.sls

bert-e · 2023-03-09T10:43:41Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
one peer

Peer approvals must include at least 1 approval from the following list:

The following reviewers are expecting changes from the author, or must review again:

@TeddyAndrieux

The following options are set: approve

Update the charts with: ``` helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update rm -rf charts/kube-prometheus-stack helm fetch -d charts --untar prometheus-community/kube-prometheus-stack ``` Also bump a number of components: - Prometheus to 2.42.0 - Thanos to 0.30.2 - Grafana to 9.3.8 - kiwigrid/k8s-sidecar to 1.22.3 - kube-state-metrics to 2.8.0 - node-exporter to 1.5.0 - prometheus-operator to 0.63.0 The chart.sls was re-rendered with: ``` ./doit.sh codegen:chart_kube-prometheus-stack ``` Since we bumped Thanos, we also re-render its own chart (without updating, since it did not change since the last update) with: ``` ./doit.sh codegen:chart_thanos ``` Important note: Alertmanager configuration was updated by hand, and a note was added to try and remind maintainers of doing it in the future. This should help making the "InfoInhibitor" alert more useful.

This changes the default configuration from kube-prometheus-stack since we still use these metrics in prometheus-adapter. Ideally, we would rather let prometheus-adapter consume node-exporter metrics, but this requires #4018 to be fixed first.

Had a flaky on this (failed on single-node but multi-nodes succeeded), let's wait a bit longer.

bert-e · 2023-03-09T11:53:26Z

Build failed

The build for commit did not succeed in branch improvement/bump-kube-prometheus-and-thanos.

The following options are set: approve

bert-e · 2023-03-09T16:55:11Z

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

✔️ development/125.0

The following branches will NOT be impacted:

development/123.0
development/124.0
development/124.1
development/2.0
development/2.1
development/2.10
development/2.11
development/2.2
development/2.3
development/2.4
development/2.5
development/2.6
development/2.7
development/2.8
development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

Any commit you add on the source branch will trigger a new cycle after the
current queue is merged.
Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

bert-e · 2023-03-09T16:55:22Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/125.0

The following branches have NOT changed:

development/123.0
development/124.0
development/124.1
development/2.0
development/2.1
development/2.10
development/2.11
development/2.2
development/2.3
development/2.4
development/2.5
development/2.6
development/2.7
development/2.8
development/2.9

Please check the status of the associated issue None.

Goodbye gdemonet.

gdemonet requested a review from a team as a code owner March 7, 2023 11:30

gdemonet changed the title ~~Bump kube-prometheus and thanos~~ Bump kube-prometheus-stack to 45.5.0 Mar 7, 2023

gdemonet added the kind:dependencies Pull requests that update a dependency file label Mar 7, 2023

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from eec4175 to 4e89c73 Compare March 7, 2023 11:35

gdemonet marked this pull request as draft March 7, 2023 11:36

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from 4e89c73 to e7a6ac9 Compare March 7, 2023 11:44

TeddyAndrieux reviewed Mar 7, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

TeddyAndrieux reviewed Mar 7, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from e7a6ac9 to 5b44514 Compare March 8, 2023 11:05

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from 5b44514 to 9680238 Compare March 8, 2023 13:58

gdemonet commented Mar 8, 2023

View reviewed changes

gdemonet mentioned this pull request Mar 8, 2023

Add node name in node-exporter metrics #4018

Open

gdemonet marked this pull request as ready for review March 8, 2023 21:11

TeddyAndrieux reviewed Mar 9, 2023

View reviewed changes

salt/metalk8s/addons/prometheus-operator/pre-upgrade.sls Outdated Show resolved Hide resolved

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from 734433c to b52b002 Compare March 9, 2023 09:12

TeddyAndrieux approved these changes Mar 9, 2023

View reviewed changes

TeddyAndrieux requested changes Mar 9, 2023

View reviewed changes

salt/metalk8s/addons/prometheus-operator/deployed/init.sls Show resolved Hide resolved

gdemonet added 3 commits March 9, 2023 11:47

tests/post: Wait longer for new Grafana dashboard

b6e5ba9

Had a flaky on this (failed on single-node but multi-nodes succeeded), let's wait a bit longer.

gdemonet force-pushed the improvement/bump-kube-prometheus-and-thanos branch from b52b002 to b6e5ba9 Compare March 9, 2023 10:47

TeddyAndrieux approved these changes Mar 9, 2023

View reviewed changes

bert-e merged commit b6e5ba9 into development/125.0 Mar 9, 2023

bert-e deleted the improvement/bump-kube-prometheus-and-thanos branch March 9, 2023 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump kube-prometheus-stack to 45.5.0 #4017

Bump kube-prometheus-stack to 45.5.0 #4017

gdemonet commented Mar 7, 2023 •

edited

Loading

bert-e commented Mar 7, 2023

bert-e commented Mar 7, 2023

TeddyAndrieux left a comment

TeddyAndrieux Mar 7, 2023 •

edited

Loading

gdemonet Mar 7, 2023

gdemonet Mar 7, 2023

This comment was marked as resolved.

bert-e commented Mar 8, 2023

gdemonet Mar 8, 2023 •

edited

Loading

gdemonet Mar 8, 2023

gdemonet Mar 8, 2023

gdemonet commented Mar 8, 2023

bert-e commented Mar 8, 2023

TeddyAndrieux left a comment

bert-e commented Mar 9, 2023

bert-e commented Mar 9, 2023

bert-e commented Mar 9, 2023

bert-e commented Mar 9, 2023

Bump kube-prometheus-stack to 45.5.0 #4017

Bump kube-prometheus-stack to 45.5.0 #4017

Conversation

gdemonet commented Mar 7, 2023 • edited Loading

bert-e commented Mar 7, 2023

Hello gdemonet,

bert-e commented Mar 7, 2023

Waiting for approval

TeddyAndrieux left a comment

Choose a reason for hiding this comment

TeddyAndrieux Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

gdemonet Mar 7, 2023

Choose a reason for hiding this comment

gdemonet Mar 7, 2023

Choose a reason for hiding this comment

This comment was marked as resolved.

bert-e commented Mar 8, 2023

Waiting for approval

gdemonet Mar 8, 2023 • edited Loading

Choose a reason for hiding this comment

gdemonet Mar 8, 2023

Choose a reason for hiding this comment

gdemonet Mar 8, 2023

Choose a reason for hiding this comment

gdemonet commented Mar 8, 2023

bert-e commented Mar 8, 2023

Waiting for approval

TeddyAndrieux left a comment

Choose a reason for hiding this comment

bert-e commented Mar 9, 2023

Waiting for approval

bert-e commented Mar 9, 2023

Build failed

bert-e commented Mar 9, 2023

In the queue

bert-e commented Mar 9, 2023

gdemonet commented Mar 7, 2023 •

edited

Loading

TeddyAndrieux Mar 7, 2023 •

edited

Loading

gdemonet Mar 8, 2023 •

edited

Loading