Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add virtualServers and clientSslProfiles labels to certain Telemetry Streaming metrics #257

Open
barakbd opened this issue Jun 14, 2023 · 7 comments
Labels
enhancement New feature or request untriaged Issue needs to be reviewed for validity

Comments

@barakbd
Copy link

barakbd commented Jun 14, 2023

Is your feature request related to a problem? Please describe.

  1. We want to show VirtualServers - clientSslProfiles relationship
  2. We want to show HTTP requests rate (2xx,3xx,4xx) by Client SSL Profile

Describe the solution you'd like

  1. Metrics:
  • f5_currentNativeConnections
  • f5_totNativeConns

Add label:
virtualServers (currently has clientSslProfiles )

  1. Metrics:
  • f5_numberReqs
  • f5_2xxResp
  • f5_3xxResp
  • f5_4xxResp
  • f5_5xxResp

Add labels:
clientSslProfiles and virtualServers

Describe alternatives you've considered

Email from Matt Stovall:
By using the telemetry streaming custom endpoint /mgmt/tm/ltm/virtual/profiles/stats you can get the equivalent metrics per virtual server. They are not using a Prometheus label, but the name of the virtual server is added to the metric name.

Example output from TS Pull consumer endpoint:

f5_vsProfileStats__Common_{{ virtual server name}}_Common_{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}_stats_common_activeHandshakeRejected
f5_vsProfileStats__Common_{{ virtual server name}}_stats__Common_{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}_stats_common_curNativeConns
f5_vsProfileStats__Common_{{ virtual server name}}__Common__{{ virtual server name}}_profiles_stats__Common_{{ virtual server name}}_profiles_Common_{{clientSSL Profile name}}__stats_common_currentActiveHandshakes

Example for virtual server name https_multi_cert and clientSSL profile default_sni:

f5_vsProfileStats__Common_https_multi_cert_stats__Common_https_multi_cert_profiles_stats__Common_https_multi_cert_profiles_Common_default_sni_stats_common_activeHandshakeRejected 0

To get these to show up in TS output, you just need to define another custom endpoint in your telemetry streaming declaration. You already have a few custom endpoints defined:

        "Custom_Endpoints": {
            "class": "Telemetry_Endpoints",
            "items": {
                "vsProfileStats": {
                    "name": "vsProfileStats",
                    "path": "/mgmt/tm/ltm/virtual/profiles/stats",
                },
    }
}

If you wanted to show the VS names as a label instead of in the name- that would take a new telemetry streaming github request. We can submit GitHub requests here: https://github.com/F5Networks/f5-telemetry-streaming/issues

@barakbd barakbd added enhancement New feature or request untriaged Issue needs to be reviewed for validity labels Jun 14, 2023
@Nachtfalkeaw
Copy link

Hello,

I think this is not a feature request it is a bug. If you use the default declaration the formatting of the metrics ist correct. If you collect the same metrics using CustomEndpoint the formatting is garbage.

Here the garbage metrics format of a custom Endpoint:
f5_customEndpoint_counters_metric

This is the same value (bitsIn/out) from the default declaration. If you do not configure anything, just enable the OpenTelemetry API for prometheus PULL it looks like this.:
f5_default_counters_metric

And both use this source:
f5_Path_counters_metric

v1.33.0 and v1.34.0 of the OpenTelemetryPlugin

@megamattzilla
Copy link

Hi @barakbd and @Nachtfalkeaw,

I'm the F5 solutions engineer working with Barak on their telemetry streaming initiatives.

It appears there are two different requests in this github issue.

1.) First Request

When you define a custom endpoint that provides statistics per virtual server such as /mgmt/tm/ltm/virtual/profiles/stats, add the virtual server name to those metrics as a label. This seems reasonable to me- all the data for this is located in the control plane already.

For example, that custom endpoint can help provide insight into bits per virtual server instead of global bits in/out (which can be very useful to identify which virtual servers have higher throughout) produces a metric like this:

# HELP f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut vsProfileStats_/Common/asm-demo-http/stats_clientside.bitsOut
# TYPE f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut gauge
f5_vsProfileStats__Common_asm_demo_http_stats_clientside_bitsOut 1028544

The metric output is formatted in a way that is difficult to parse. The virtual server name /Common/asm_demo_http is there- but its difficult to extract and then graph that this metric is the bitsOut for virtual server /Common/asm_demo_http. Ideally the name could be improved and a prometheus friendly label could be added so that the metric looks like this instead:

# HELP f5_vsProfileStats__clientside_bitsOut vsProfileStats_/Common/asm-demo-http/stats_clientside.bitsOut
# TYPE f5_vsProfileStats__clientside_bitsOut gauge
f5_vsProfileStats__clientside_bitsOut{virtualServers="/Common/asm-demo-http"} 1028544

That way the metric could be natively graphed in prometheus/grafana as associated with virtual server /Common/asm_demo_http. You could then graph bits per second by virtual server instead of only having global bits Out and not knowing which virtual servers are contributing to that.

2.) Second Request

Add labels for clientSslProfiles and virtualServers names to various metrics produced by clientSSL and HTTP profiles. To my knowledge, there is no data in TMOS that maps these things together that telemetry streaming could query.

I suggest we focus on the first request as that seems within the scope of TS and immediately useful.

Thanks!

@Nachtfalkeaw
Copy link

The metrics names should be the same if I query the same metrics than in default configuration. The reason for that is pretty simple. If I query all metrics every 5 seconds the CPUs are overloaded. However for very limited amout of values apolling interval of 5s is usefull e.g. CPU and memory.

Other values like overall throughput there it is sufficient to poll every 15s and other things every 60s.

software versions, hw version ist relevant only e.g. every 6hrs.

So the idea of different Pull_Consumers is very good. However to use them the "Custom_Endpoints" must generate the same metric output than the default poll so that the metrics from different intervalls can be matched correctly - and not only matched correctly - they should be the same metric. if every Poller generates different metrics the result is duplicate metrics in Prometheus. The metrics from default poller for CPU and the metrics for CPU from Custom Endpoint.

However - if it is not possible to generate the same metrics name than the different metrics should share the same label sets so that it is possible to merge different metrics based on the same labels - and hopefully the labels unique identify that they are the same.

@B0go
Copy link

B0go commented Mar 13, 2024

I can confirm this problem is also affecting me! Once I enable custom endpoints to filter out the results of the scrape (so I can avoid the CPU overload), the metrics get reported in a different pattern:

# HELP f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser detailedCPU_sys/host-info/0_sys/hostInfo/0/cpuInfo_sys/hostInfo/0/cpuInfo/1_oneMinAvgUser
# TYPE f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser gauge
f5_detailedCPU_sys_host_info_0_sys_hostInfo_0_cpuInfo_sys_hostInfo_0_cpuInfo_1_oneMinAvgUser 14

This also makes the process of finding which endpoints have the metrics I need pretty hard

@B0go
Copy link

B0go commented Mar 14, 2024

@megamattzilla This has become a blocker for using TS to observe the BigIPs using Prometheus as the metrics engine, mainly because, on the one hand, we can't enable the collection of all metrics without seeing a significant impact on CPU usage. On the other hand, we can't use the custom endpoint approach as the current output doesn't allow for proper label matching, filtering, etc.

If we can't find a solution, we will be forced to use the snmp_exporter. I would gladly avoid that if possible, as it requires more configuration complexity.

Do you have any status updates that can be shared?

@pgouband
Copy link

Hi @B0go,

Please contact your F5 account team so they can contact us (the product management team).

@barakbd
Copy link
Author

barakbd commented Mar 20, 2024

It would really helpful to simply allow TF metrics to have customized labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged Issue needs to be reviewed for validity
Projects
None yet
Development

No branches or pull requests

5 participants