-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Warning & Operation metrics #14323
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do warnings always have a node?
They do not, this is marked as Draft while I/we decide on how to handle nodeless warnings, I am thinking just include them in all node's metrics. |
Yeah, or just on the leader? |
I think this makes even more sense. Since Prometheus is supposed to scrape them on every node, counting them for each node can have the effect of readundantly counting these warnings many times when aggregating the measurements. |
471cc6e
to
880764a
Compare
@tomponline @markylaing I am thinking on waiting for #14261 before proceeding here since I intend to use the |
880764a
to
7b088a0
Compare
@hamistao please can we get this one fixed ready for LXD 6.2 release |
7b088a0
to
bd71ccc
Compare
bd71ccc
to
2a24a74
Compare
@@ -378,10 +379,40 @@ func getFilteredMetrics(s *state.State, r *http.Request, compress bool, metricSe | |||
return response.SyncResponsePlain(true, compress, metricSet.String()) | |||
} | |||
|
|||
func internalMetrics(ctx context.Context, daemonStartTime time.Time, tx *db.ClusterTx) *metrics.MetricSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change to this function's signature should be a separate commit as not related to the commit message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must include the state as a parameter on internalMetrics
for this commit since clusterMemberWarnings
requires it to call s.LeaderInfo()
, so it is a change necessary for "Filtering query for Warnings appropriately" (as per the commit message).
That said, I can split this in two commits if you think it would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would have plumbed in the replacement of daemonStartTime argument with *state.State as single commit first, including passing s.StartTime instead of daemonStartTime to out.AddSamples()
.
Then as another commit I would have introcuced the clusterMemberWarnings
function and replaced the call to dbCluster.GetWarnings
in internalMetrics
.
As it is now the changes are munged together and harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! I find it useful to understand how you think about this so I can try to make your life easier :)
I will be pushing these changes along with the tests
Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]> (cherry picked from commit 24f150cf451eef796d879f1a191004ef0504b301) Signed-off-by: hamistao <[email protected]>
Signed-off-by: hamistao <[email protected]> (cherry picked from commit decddf5adfbd0c34c1f189c4eff16c19b3527abd) Signed-off-by: hamistao <[email protected]>
Filters query for Warnings on the metrics handler by Node. Since some Warnings do not have a node, nodeless Warnings are only being included if querying the metrics from the leader node. Signed-off-by: hamistao <[email protected]>
2a24a74
to
e24204d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add some tests for this behaviour by faking some warnings and checking they only appear in the leader endpoint?
@tomponline Yes, it is. This is precisely how I have been testing: making dummy warnings with |
lxd_operations_total
andlxd_warnings_total
are currently being taken clusterwide, instead of returning just the entities related to the Node responding the metrics request. This goes against the overall design of the metrics, that are supposed to be per node and queried on each node on a cluster.To fix this, this PR filters the queries for those entities based on the Node.