Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy sli #342

Merged
merged 32 commits into from
Sep 29, 2023
Merged

Proxy sli #342

merged 32 commits into from
Sep 29, 2023

Conversation

ranakan19
Copy link
Contributor

@ranakan19 ranakan19 commented Sep 1, 2023

This PR introduces two metrics for Registration-Service proxy:

  1. RegServProxyApiHistogramVec - measures the time taken by proxy before forwarding the request
  2. RegServWorkspaceHistogramVec - measures the response time for either response or error from proxy when there is no routing

Since they are both of the type HistogramVec, the metrics can be partioned on labels.
RegServWorkspaceHistogramVec uses labels status_code and kube_verb to identify the type of request() and whether the request was successful.
While RegServProxyRouteHistogramVec uses labels status_code and route_to to store Host URL of the target cluster and whether the request was acceptable. In the case that the request wasn't acceptable the value for route_to will be populated as Rejected

Here is a sample of the output of metrics on dev-cluster:

# HELP sandbox_proxy_api_http_request_time time taken by proxy to route to a target cluster
# TYPE sandbox_proxy_api_http_request_time histogram
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="0.05"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="0.1"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="0.25"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="0.5"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="1"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="5"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="10"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="Rejected",status_code="406",le="+Inf"} 1
sandbox_proxy_api_http_request_time_sum{route_to="Rejected",status_code="406"} 0.000216156
sandbox_proxy_api_http_request_time_count{route_to="Rejected",status_code="406"} 1
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="0.05"} 12
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="0.1"} 12
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="0.25"} 12
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="0.5"} 12
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="1"} 12
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="5"} 13
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="10"} 13
sandbox_proxy_api_http_request_time_bucket{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202",le="+Inf"} 13
sandbox_proxy_api_http_request_time_sum{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202"} 2.5381906469999995
sandbox_proxy_api_http_request_time_count{route_to="api.krana-sep2723.devcluster.openshift.com:6443",status_code="202"} 13
# HELP sandbox_proxy_workspace_http_request_time time for response of a request to proxy 
# TYPE sandbox_proxy_workspace_http_request_time histogram
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="0.05"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="0.1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="0.25"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="0.5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="10"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="200",le="+Inf"} 1
sandbox_proxy_workspace_http_request_time_sum{kube_verb="Get",status_code="200"} 0.000774987
sandbox_proxy_workspace_http_request_time_count{kube_verb="Get",status_code="200"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="0.05"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="0.1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="0.25"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="0.5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="10"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="Get",status_code="404",le="+Inf"} 1
sandbox_proxy_workspace_http_request_time_sum{kube_verb="Get",status_code="404"} 0.00072671
sandbox_proxy_workspace_http_request_time_count{kube_verb="Get",status_code="404"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="0.05"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="0.1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="0.25"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="0.5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="10"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="200",le="+Inf"} 1
sandbox_proxy_workspace_http_request_time_sum{kube_verb="List",status_code="200"} 0.002846301
sandbox_proxy_workspace_http_request_time_count{kube_verb="List",status_code="200"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="0.05"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="0.1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="0.25"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="0.5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="1"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="5"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="10"} 1
sandbox_proxy_workspace_http_request_time_bucket{kube_verb="List",status_code="500",le="+Inf"} 1
sandbox_proxy_workspace_http_request_time_sum{kube_verb="List",status_code="500"} 0.000815715
sandbox_proxy_workspace_http_request_time_count{kube_verb="List",status_code="500"} 1

@openshift-ci
Copy link

openshift-ci bot commented Sep 1, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ranakan19
Copy link
Contributor Author

/retest

pkg/proxy/proxy.go Outdated Show resolved Hide resolved
Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good 👍 I added a few suggestions, but nothing critical.
Could you please update also the unit tests?

pkg/context/keys.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/proxy/handlers/spacelister.go Outdated Show resolved Hide resolved
pkg/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Files Coverage Δ
pkg/proxy/handlers/metrics.go 100.00% <100.00%> (ø)
pkg/proxy/handlers/spacelister.go 83.05% <100.00%> (+0.59%) ⬆️
pkg/server/routes.go 77.77% <ø> (-0.49%) ⬇️
pkg/proxy/proxy.go 85.26% <93.75%> (+0.77%) ⬆️
pkg/metrics/metrics.go 88.00% <88.00%> (ø)

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@rajivnathan rajivnathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Just a few questions in the comments.

pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/proxy/handlers/spacelister.go Outdated Show resolved Hide resolved
@ranakan19
Copy link
Contributor Author

/retest

Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few reminders

pkg/context/keys.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
Comment on lines 19 to 22
// RegServProxyRouteHistogramVec measures the time taken by proxy before forwarding the request
RegServProxyRouteHistogramVec *prometheus.HistogramVec
// RegServProxyResponseHistogramVec measures the response time for either response or error from proxy when there is no routing
RegServProxyResponseHistogramVec *prometheus.HistogramVec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make the variable names shorter by dropping the RegServ prefix:

Suggested change
// RegServProxyRouteHistogramVec measures the time taken by proxy before forwarding the request
RegServProxyRouteHistogramVec *prometheus.HistogramVec
// RegServProxyResponseHistogramVec measures the response time for either response or error from proxy when there is no routing
RegServProxyResponseHistogramVec *prometheus.HistogramVec
// ProxyRouteHistogramVec measures the time taken by proxy before forwarding the request
ProxyRouteHistogramVec *prometheus.HistogramVec
// ProxyResponseHistogramVec measures the response time from proxy when there is no routing
ProxyResponseHistogramVec *prometheus.HistogramVec

pkg/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/server/routes.go Outdated Show resolved Hide resolved
pkg/server/routes.go Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Sep 27, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
12.0% 12.0% Duplication

Copy link
Contributor

@xcoulon xcoulon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @ranakan19 ! 🙌
and thanks for addressing my comments 😄

Copy link
Contributor

@MatousJobanek MatousJobanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍 Good job @ranakan19 🥇 🚀

Copy link
Contributor

@rajivnathan rajivnathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 🙌

m.WithLabelValues("500", "list").Observe((1 * time.Millisecond).Seconds())

//then
assert.Equal(t, 4, promtestutil.CollectAndCount(m, "sandbox_test_histogram_vec"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, why is it 4? Maybe worth adding a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because HistogramVec are partioned on labels, and we have 4 label combination in the test. Sure let me add a label

@openshift-ci
Copy link

openshift-ci bot commented Sep 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MatousJobanek, rajivnathan, ranakan19, xcoulon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [MatousJobanek,xcoulon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MatousJobanek MatousJobanek merged commit e1dbff6 into codeready-toolchain:master Sep 29, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants