Fix Warning & Operation metrics #14323

hamistao · 2024-10-22T12:27:01Z

lxd_operations_total and lxd_warnings_total are currently being taken clusterwide, instead of returning just the entities related to the Node responding the metrics request. This goes against the overall design of the metrics, that are supposed to be per node and queried on each node on a cluster.
To fix this, this PR filters the queries for those entities based on the Node.

tomponline

Do warnings always have a node?

hamistao · 2024-10-22T12:30:31Z

Do warnings always have a node?

They do not, this is marked as Draft while I/we decide on how to handle nodeless warnings, I am thinking just include them in all node's metrics.

tomponline · 2024-10-22T12:33:01Z

Do warnings always have a node?

They do not, this is marked as Draft while I/we decide on how to handle nodeless warnings, I am thinking just include them in all node's metrics.

Yeah, or just on the leader?

hamistao · 2024-10-22T12:39:22Z

Yeah, or just on the leader?

I think this makes even more sense. Since Prometheus is supposed to scrape them on every node, counting them for each node can have the effect of readundantly counting these warnings many times when aggregating the measurements.

lxd/api_metrics.go

hamistao · 2024-10-23T10:05:23Z

@tomponline @markylaing I am thinking on waiting for #14261 before proceeding here since I intend to use the LeaderInfo function defined there. I have cherry picked some commits to test and it all works fine.

tomponline · 2024-11-06T10:43:05Z

@hamistao please can we get this one fixed ready for LXD 6.2 release

lxd/api_metrics.go

tomponline · 2024-11-12T10:28:40Z

lxd/api_metrics.go

@@ -378,10 +379,40 @@ func getFilteredMetrics(s *state.State, r *http.Request, compress bool, metricSe
 	return response.SyncResponsePlain(true, compress, metricSet.String())
 }

-func internalMetrics(ctx context.Context, daemonStartTime time.Time, tx *db.ClusterTx) *metrics.MetricSet {


the change to this function's signature should be a separate commit as not related to the commit message

I must include the state as a parameter on internalMetrics for this commit since clusterMemberWarnings requires it to call s.LeaderInfo(), so it is a change necessary for "Filtering query for Warnings appropriately" (as per the commit message).
That said, I can split this in two commits if you think it would be simpler.

Personally I would have plumbed in the replacement of daemonStartTime argument with *state.State as single commit first, including passing s.StartTime instead of daemonStartTime to out.AddSamples().

Then as another commit I would have introcuced the clusterMemberWarnings function and replaced the call to dbCluster.GetWarnings in internalMetrics.

As it is now the changes are munged together and harder to review.

Noted! I find it useful to understand how you think about this so I can try to make your life easier :)

I will be pushing these changes along with the tests

Signed-off-by: hamistao <[email protected]>

Signed-off-by: hamistao <[email protected]> (cherry picked from commit 24f150cf451eef796d879f1a191004ef0504b301) Signed-off-by: hamistao <[email protected]>

Signed-off-by: hamistao <[email protected]> (cherry picked from commit decddf5adfbd0c34c1f189c4eff16c19b3527abd) Signed-off-by: hamistao <[email protected]>

Filters query for Warnings on the metrics handler by Node. Since some Warnings do not have a node, nodeless Warnings are only being included if querying the metrics from the leader node. Signed-off-by: hamistao <[email protected]>

tomponline

Is it possible to add some tests for this behaviour by faking some warnings and checking they only appear in the leader endpoint?

hamistao · 2024-11-13T17:42:02Z

Is it possible to add some tests for this behaviour by faking some warnings and checking they only appear in the leader endpoint?

@tomponline Yes, it is. This is precisely how I have been testing: making dummy warnings with lxd sql.

tomponline reviewed Oct 22, 2024

View reviewed changes

lxd/api_metrics.go Outdated Show resolved Hide resolved

hamistao force-pushed the fix_warning_operation_metrics branch from 471cc6e to 880764a Compare October 23, 2024 10:01

hamistao force-pushed the fix_warning_operation_metrics branch from 880764a to 7b088a0 Compare November 1, 2024 13:38

hamistao force-pushed the fix_warning_operation_metrics branch from 7b088a0 to bd71ccc Compare November 6, 2024 11:20

hamistao requested a review from tomponline November 6, 2024 11:21

tomponline marked this pull request as ready for review November 6, 2024 11:30

hamistao force-pushed the fix_warning_operation_metrics branch from bd71ccc to 2a24a74 Compare November 6, 2024 19:57

tomponline reviewed Nov 12, 2024

View reviewed changes

lxd/api_metrics.go Outdated Show resolved Hide resolved

tomponline reviewed Nov 12, 2024

View reviewed changes

hamistao added 4 commits November 12, 2024 14:00

lxd/api_metrics: Filter Operation query by node

c4fb83a

Signed-off-by: hamistao <[email protected]>

lxd/db/cluster/warnings: Allow filtering by Node and Status

13506c5

Signed-off-by: hamistao <[email protected]> (cherry picked from commit 24f150cf451eef796d879f1a191004ef0504b301) Signed-off-by: hamistao <[email protected]>

lxd/db/cluster: Run make update-schema

0cc6c7f

Signed-off-by: hamistao <[email protected]> (cherry picked from commit decddf5adfbd0c34c1f189c4eff16c19b3527abd) Signed-off-by: hamistao <[email protected]>

hamistao force-pushed the fix_warning_operation_metrics branch from 2a24a74 to e24204d Compare November 12, 2024 17:02

tomponline requested changes Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Warning & Operation metrics #14323

Fix Warning & Operation metrics #14323

hamistao commented Oct 22, 2024

tomponline left a comment

hamistao commented Oct 22, 2024

tomponline commented Oct 22, 2024

hamistao commented Oct 22, 2024

hamistao commented Oct 23, 2024

tomponline commented Nov 6, 2024

tomponline Nov 12, 2024

hamistao Nov 12, 2024 •

edited

Loading

tomponline Nov 13, 2024

hamistao Nov 13, 2024

tomponline left a comment

hamistao commented Nov 13, 2024

Fix Warning & Operation metrics #14323

Are you sure you want to change the base?

Fix Warning & Operation metrics #14323

Conversation

hamistao commented Oct 22, 2024

tomponline left a comment

Choose a reason for hiding this comment

hamistao commented Oct 22, 2024

tomponline commented Oct 22, 2024

hamistao commented Oct 22, 2024

hamistao commented Oct 23, 2024

tomponline commented Nov 6, 2024

tomponline Nov 12, 2024

Choose a reason for hiding this comment

hamistao Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

tomponline Nov 13, 2024

Choose a reason for hiding this comment

hamistao Nov 13, 2024

Choose a reason for hiding this comment

tomponline left a comment

Choose a reason for hiding this comment

hamistao commented Nov 13, 2024

hamistao Nov 12, 2024 •

edited

Loading