-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump kube-prometheus-stack to 45.5.0 #4017
Bump kube-prometheus-stack to 45.5.0 #4017
Conversation
Hello gdemonet,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
eec4175
to
4e89c73
Compare
4e89c73
to
e7a6ac9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really really hard to review
Just by looking at the changes on the salt side it looks good to me (except for the upgrade handling)
repository: '__image__(alertmanager)' | ||
registry: '__var__(repo.registry_endpoint)' | ||
repository: '__image_no_reg__(alertmanager)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad, but ok, yes it's needed
But cannot we set this registry only once in global
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right will do, found out too late about this one 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh no I can't do that because the kube-state-metrics subchart doesn't use this value correctly:
image: "{{ .Values.global.imageRegistry | default .Values.image.repository }}:{{ .Values.image.tag | ...
😭
I'll submit a PR over there to have it fixed
This comment was marked as resolved.
This comment was marked as resolved.
Some charts now expect the image registry to be defined separately from the repository, and enforces that these values are joined with a slash. This causes issues with our macro "build_image_name", which builds the whole path. We add an option to this macro to omit the registry endpoint, and make this value available to rendered charts using the charts/render.py script header.
e7a6ac9
to
5b44514
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
5b44514
to
9680238
Compare
# Drop cgroup metrics with no pod. | ||
- sourceLabels: [id, pod] | ||
action: drop | ||
regex: '.+;' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes the current prometheus-adapter resourceRules.(cpu|memory).nodeQuery
to not hit anything, because we dropped the metrics for nodes (pod="", id="/"
).
It appears the goal is to now rely on more efficient node-exporter metrics (see kubernetes-sigs/prometheus-adapter#516 and prometheus-community/helm-charts#2827), but we're now hitting an issue with labels: our node-exporter metrics don't have a "node" label, and that's not good when trying to map to /metrics.k8s.io/v1beta1/nodes
.
We will need to fix this label issue (TBH, would be much simpler to explore and query, e.g. from our UI), but I'm not sure how involved this will be. For now, I'm considering a temporary workaround by keeping these metrics around, with a follow-up ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the follow-up #4018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the workaround: 97ce7c6
/approve |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: The following options are set: approve |
734433c
to
b52b002
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: The following reviewers are expecting changes from the author, or must review again: The following options are set: approve |
Update the charts with: ``` helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update rm -rf charts/kube-prometheus-stack helm fetch -d charts --untar prometheus-community/kube-prometheus-stack ``` Also bump a number of components: - Prometheus to 2.42.0 - Thanos to 0.30.2 - Grafana to 9.3.8 - kiwigrid/k8s-sidecar to 1.22.3 - kube-state-metrics to 2.8.0 - node-exporter to 1.5.0 - prometheus-operator to 0.63.0 The chart.sls was re-rendered with: ``` ./doit.sh codegen:chart_kube-prometheus-stack ``` Since we bumped Thanos, we also re-render its own chart (without updating, since it did not change since the last update) with: ``` ./doit.sh codegen:chart_thanos ``` Important note: Alertmanager configuration was updated by hand, and a note was added to try and remind maintainers of doing it in the future. This should help making the "InfoInhibitor" alert more useful.
This changes the default configuration from kube-prometheus-stack since we still use these metrics in prometheus-adapter. Ideally, we would rather let prometheus-adapter consume node-exporter metrics, but this requires #4018 to be fixed first.
Had a flaky on this (failed on single-node but multi-nodes succeeded), let's wait a bit longer.
b52b002
to
b6e5ba9
Compare
Build failedThe build for commit did not succeed in branch improvement/bump-kube-prometheus-and-thanos. The following options are set: approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue None. Goodbye gdemonet. |
Update the charts with:
Also bump a number of components:
The chart.sls was re-rendered with:
Since we bumped Thanos, we also re-render its own chart (without
updating, since it did not change since the last update) with:
Important note: Alertmanager configuration was updated by hand,
and a note was added to try and remind maintainers of doing it in the
future. This should help making the "InfoInhibitor" alert more useful.