Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tag_from_uri in opentelemetry input plugin has no effect #8734

Closed
trondhindenes opened this issue Apr 19, 2024 · 10 comments · Fixed by #8881 or #8962
Closed

tag_from_uri in opentelemetry input plugin has no effect #8734

trondhindenes opened this issue Apr 19, 2024 · 10 comments · Fixed by #8881 or #8962

Comments

@trondhindenes
Copy link

Bug Report

Describe the bug
if configuring tag_from_uri on the opentelemetry input plugin, tags are not updated

To Reproduce

  • Steps to reproduce the problem:
    given this config:
pipeline:
    inputs:
        - name: opentelemetry
          listen: 0.0.0.0
          port: 4320
          tag_from_uri: "true"
    outputs:
        - name: 'stdout'
          match: '*'

And this curl

curl --header "Content-Type: application/json" --request POST --data '{"resourceLogs":[{"resource":{},"scopeLogs":[{"scope":{},"logRecords":[{"timeUnixNano":"1660296023390371588","body":{"stringValue":"{\"message\":\"dummy\"}"},"traceId":"","spanId":""}]}]}]}'   http://localhost:4320/v1/logs

I would expect the message to be tagged with v1_logs. However, it's not retagged, the tag is still opentelemetry.0

Expected behavior
tags should be changed to reflect the uri

Your Environment

  • Version used: current version in cr.fluentbit.io/fluent/fluent-bit
  • Configuration: see above
  • Environment name and version (e.g. Kubernetes? What version?): regular docker

Additional context
without this functionality it is very difficult to route opentelemetry data (for example logs to loki, metrics to prometheus etc) using fluent-bit.

@nuclearpidgeon
Copy link
Contributor

Moreover, if you use a match rule like v1_logs for output, it just doesn't match at all - which means it's just not possible to route only one particular OpenTelemetry signal type to a particular output.

e.g. with config:

service:
    log_level: debug
pipeline:
    inputs:
        - name: opentelemetry
          tag_from_uri: true
    outputs:
        - name: 'stdout'
          match: 'v1_logs'

Running the same curl to default port 4318 yields this in the fluentbit logs

[2024/05/16 11:10:11] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/05/16 11:10:11] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/05/16 11:10:11] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/05/16 11:10:11] [debug] [downstream] listening on 0.0.0.0:4318
[2024/05/16 11:10:11] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/05/16 11:10:11] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/05/16 11:10:11] [ info] [sp] stream processor started
[2024/05/16 11:10:11] [ info] [output:stdout:stdout.0] worker #0 started
[2024/05/16 11:11:38] [debug] [input:opentelemetry:opentelemetry.0] attributes missing
[2024/05/16 11:11:38] [debug] [task] created task=0x7e5c7de365a0 id=0 without routes, dropping.
[2024/05/16 11:11:38] [debug] [task] destroy task=0x7e5c7de365a0 (task_id=0)

With a match: '*', the output at least comes through:

[2024/05/16 11:14:34] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/05/16 11:14:34] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/05/16 11:14:34] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/05/16 11:14:34] [debug] [downstream] listening on 0.0.0.0:4318
[2024/05/16 11:14:34] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/05/16 11:14:34] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/05/16 11:14:34] [ info] [sp] stream processor started
[2024/05/16 11:14:34] [ info] [output:stdout:stdout.0] worker #0 started
[2024/05/16 11:14:35] [debug] [input:opentelemetry:opentelemetry.0] attributes missing
[2024/05/16 11:14:36] [debug] [task] created task=0x723f4e2365a0 id=0 OK
[2024/05/16 11:14:36] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] opentelemetry.0: [[1660296023.1698112429, {}], {"log"=>"{"message":"dummy"}"}]
[2024/05/16 11:14:36] [debug] [out flush] cb_destroy coro_id=0
[2024/05/16 11:14:36] [debug] [task] destroy task=0x723f4e2365a0 (task_id=0)

@nuclearpidgeon
Copy link
Contributor

nuclearpidgeon commented May 16, 2024

I've been digging into the fluentbit C code to try work out what's going on here.

It turns out there are two totally different sets of handling code for OpenTelemetry input data. You will see in plugins/in_opentelemetry/opentelemetry.c that there is a large if case split on an undocumented http2 option.

The newer _ng function doesn't seem to have any code at all that respects the tag_from_ui config - it just passes through the tag set on the input plugin in all three signal cases. (In the metrics case, the tag isn't even propagated (NULL ptr for tag buf, 0 for tag len)).

The older function has code that respects the tag_from_ui config and does actually prepare a tag. However again, in the metrics case, it's not propagated. In the non-raw trace case it isn't either.

It does at least seem that if you switch to the HTTP1 handler, you can at least route logs using the tag mechanism.
With config:

service:
    log_level: debug
pipeline:
    inputs:
        - name: opentelemetry
          tag_from_uri: true
          http2: false
    outputs:
        - name: 'stdout'
          match: 'v1_logs'

...I get:

[2024/05/16 12:00:58] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/05/16 12:00:58] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/05/16 12:00:58] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/05/16 12:00:58] [debug] [downstream] listening on 0.0.0.0:4318
[2024/05/16 12:00:58] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/05/16 12:00:58] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/05/16 12:00:58] [ info] [sp] stream processor started
[2024/05/16 12:00:58] [ info] [output:stdout:stdout.0] worker #0 started
[2024/05/16 12:01:10] [debug] [input:opentelemetry:opentelemetry.0] attributes missing
[2024/05/16 12:01:11] [debug] [task] created task=0x7032ce836640 id=0 OK
[2024/05/16 12:01:11] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] v1_logs: [[1660296023.1698112429, {}], {"log"=>"{"message":"dummy"}"}]
[2024/05/16 12:01:11] [debug] [out flush] cb_destroy coro_id=0
[2024/05/16 12:01:11] [debug] [task] destroy task=0x7032ce836640 (task_id=0)

I may be able to put together a PR for some of this if I get to the end of my digging and can make some code changes that I can validate will work.

nuclearpidgeon added a commit to nuclearpidgeon/fluent-bit that referenced this issue May 19, 2024
@shaohme
Copy link
Contributor

shaohme commented May 27, 2024

I've got the same problem with my 3.0.3 setup. I have patched the in_opentelemetry plugin to pass the tag with length. Seems to work locally and unit tests pass.

@nuclearpidgeon
Copy link
Contributor

I was able to get the v1_metrics / v1_logs / v1_traces tags working again for the HTTP2 server codepath in this commit: nuclearpidgeon@39cd65d

The issue with this is that... it's arguably not strictly correct, because it will set those same tags in the GRPC cases, which technically have more complex URIs like:

  • opentelemetry.proto.collector.metrics.v1.MetricsService/Export
  • opentelemetry.proto.collector.traces.v1.TracesService/Export
  • opentelemetry.proto.collector.logs.v1.LogsService/Export

(The non-HTTP2 code path didn't have this issue because... it just didn't support GRPC calls.)

IMO what would make more sense is a "tag from signal" option that just sets metrics/logs/traces depending on the signal type. Additionally, it probably makes more sense that there is also the option for this tag to be appended, not just replaced. I can forsee this behaviour is almost definitely going to be needed in non-trivial routing situations with data being forwarded on to other Fluentbit instances.

@nuclearpidgeon
Copy link
Contributor

So, this issue is not really completed, because the fix in #8881 was only for the older non-HTTP/2 codepath (which is not taken by default anyway because the http2 plugin option is undocumented and defaults to on 🙃).

I've just submitted some PRs for all the work I was doing related to fixing this for the HTTP/2 codepath case, including some actual automated tests for the issue. Some bits still need some cleanup but wanted to at least put what I have done out there first.

nuclearpidgeon added a commit to nuclearpidgeon/fluent-bit that referenced this issue Jul 1, 2024
@nuclearpidgeon
Copy link
Contributor

Just spotted that _ng (HTTP/2) handlers for non-log payloads propagate NULL tags as well... ._.

process_payload_metrics_ng:

result = flb_input_metrics_append(ctx->ins, NULL, 0, context);

process_payload_traces_proto_ng

result = flb_input_trace_append(ctx->ins, NULL, 0, decoded_context);

Although for some reason, process_payload_raw_traces_ng does propagate the tag...

flb_input_log_append(ctx->ins, tag, flb_sds_len(tag), mp_sbuf.data, mp_sbuf.size);

@alanbrito
Copy link

I can see this issue is officially closed, but per the previous comment (and our experience), it still doesn't work. We need this in order to receive opentelemetry metrics and traces on a single opentelemetry input plugin, and send only the metrics to Grafana Mimir, and the traces to Grafana Tempo. Without this, both output plugins pick up data meant for the other and fail when trying to route to the coreect place.

Are there any plans to address this?

@jpvallez
Copy link

jpvallez commented Sep 26, 2024

Hi everyone!

I am trying to separate telemetry signals too by using the tag_from_uri config. However, unfortunately it looks like it's still not working.

Fluent Bit version: v3.1.8 (released last week)

Config:

[INPUT]
    Name opentelemetry
    Raw_Traces false
    **tag_from_uri true**
    Host 0.0.0.0
    Port 4318
 
[OUTPUT]
    Name opentelemetry
    Match v1_traces
    Traces_uri /v1/traces
    Host localhost
    Port 4328
 
[OUTPUT]
    name            prometheus_exporter
    match           v1_metrics
    host            0.0.0.0
    port            2021
    # add user-defined labels
    add_label       app fluent-bit
    add_label       color blue
 
[OUTPUT]
    name  stdout
    match v1_logs

What results is that no otel is routed anywhere, as none of the tags match.
e.g. extract from fluent-bit logs:
[2024/09/26 01:34:41] [debug] [task] created task=0x7f0151637180 id=0 without routes, dropping.

If I add an additional output to the above config as file, with a match all as below:

[OUTPUT]
    name file
    match *

This outputs one file to the / dir, named "opentelemetry.0" with all of the otel traces,metrics,logs combined. The other outputs are not routed to, as theres no tag match.

The documentation here https://docs.fluentbit.io/manual/pipeline/inputs/opentelemetry states:

tag_from_uri If true, tag will be created from uri. e.g. v1_metrics from /v1/metrics .

But this seems to have no effect at all.

Am I configuring something wrong? Are there any workarounds available? Unsure why this issue is marked as Closed.

Any help would be greatly appreciated!

Thanks! :)

@shaohme
Copy link
Contributor

shaohme commented Sep 26, 2024

tag_from_uri

Just to be sure; this was only fixed for the HTTP 1.1 handler. If you're using HTTP2 it could be ignored. Try disable HTTP2 manually on the OTEL [INPUT] section with Http2 False

@jpvallez
Copy link

Thanks @shaohme :)

I've just tested with the Http2 False config and it seems to work as expected however only for HTTP but not GRPC (intended behaviour).
Basically we'd need to send otel to fluentbit using only HTTP and not GRPC in order to use this functionality.


Does anyone know if tag_from_uri on http2 is meant to be supported?

Looks like there's an open PR for this over here #8963 but unsure if it's actively being worked on.

The doco on https://docs.fluentbit.io/manual/pipeline/inputs/opentelemetry should be updated to reflect this limitation as it's unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants