Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dot to underscore replacement in ES output #708

Closed
konstantin-kornienko-epam opened this issue Aug 8, 2018 · 5 comments
Closed

dot to underscore replacement in ES output #708

konstantin-kornienko-epam opened this issue Aug 8, 2018 · 5 comments
Assignees

Comments

@konstantin-kornienko-epam
Copy link

konstantin-kornienko-epam commented Aug 8, 2018

This code in es.c is replacing dots to underscores in keys (fileds name):

       /*
         * Sanitize key name, Elastic Search 2.x don't allow dots
         * in field names:
         *
         *   https://goo.gl/R5NMTr
         */
        char *p   = ptr_key;
        char *end = ptr_key + key_size;
        while (p != end) {
            if (*p == '.') *p = '_';
            p++;
        }

This behavior relies on this article, that is applicable for Elastic 2.x.

But for Elastic 5.0 and higher dots are permitted again :). Actually Elastic Beats™ are using dot in field names, example: filebeat fields for kubernetes:.

So maybe it makes sense to have an option in es output "Replace dots in fields name"?

@edsiper edsiper self-assigned this Aug 8, 2018
edsiper added a commit that referenced this issue Aug 8, 2018
On Elasticsearch 2.0-2.3, key names with dots were not allowed for
hence was required to replace every dot with an underscore. This
requirement is not longer true on newer versions of Elasticsearch.

The following patch introduce a new option called 'replace_dot' which
is disabled by default. So now the plugin will only replace the dots
if the flag is enabled for use cases where old versions of Elasticsearch
are in use.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Aug 8, 2018

@konstantin-kornienko-epam thanks for raising this topic and suggestion.

I've implemented the improvement through 6764ac7 commit.

thanks!

@edsiper edsiper added the fixed label Aug 8, 2018
@edsiper edsiper closed this as completed Aug 8, 2018
@konstantin-kornienko-epam
Copy link
Author

Wow! So fast, almost at speed of light ;) Thank you, Eduardo!

@mabrarov
Copy link

mabrarov commented Jun 14, 2023

Hi @edsiper and @PettitWesley,

Is there a chance to get the same feature for Forward plugin? I see #1305 and #1538 are closed 3 years ago but the new filter is still not implemented. The lack of this feature for Forward plugin (and for other output plugins) along with #1560 being still pending complicates communication of Fluent Bit with Elasticsearch cluster (consisting of multiple hosts) - we cannot use:

  1. Fluent Bit → Elasticsearch cluster due to lack of es_out: support Upstream Servers  #1560 (preferred solution, but es_out: support Upstream Servers  #1560 is pending for almost 4 years).
  2. Fluent Bit (Forward output, possibly with Upstream for HA) → Fluentd (supports multiple Elasticsearch hosts, we already have this instance, so for my team it should not bring any additional work) → Elasticsearch cluster, because of dots in the names of fields (we need to use underscores for compatibility).

My team would like to use Fluent Bit (running as DaemonSet) for shipping logs from K8s to Elasticsearch cluster instead of Fluentd (running as DaemonSet), but so far Fluent Bit brings issues, which delay this migration.

Thank you.

@PettitWesley
Copy link
Contributor

@mabrarov my recommendation to unblock yourself is some sort of custom filter with lua or wasm should be able to remove dots:

@mabrarov
Copy link

mabrarov commented Jun 15, 2023

Hi @PettitWesley,

Thank you for quick reply and suggestion. I found #4651 and used this Lua script (which part is a script taken and slightly modified from #1305) in my Fluent Bit configuration to replace dots with underscores in problematic fields and to flatten some fields (which I use later in Fluentd to decide what Elasticsearch index to route request, it's required because respective Fluentd plugin cannot work with nested fields):

function copy_routing_fields_with_underscore(tag, timestamp, record)
    local kubernetes_fields = record["kubernetes"]
    if kubernetes_fields == nul then
       return 0, 0, 0
    end
    local new_record = record
    new_record["kubernetes_namespace_name"] = kubernetes_fields["namespace_name"]
    local kubernetes_labels = kubernetes_fields["labels"]
    if kubernetes_labels == nil then
        return 1, timestamp, new_record
    end
    new_record["kubernetes_labels_log_space"] = kubernetes_labels["log-space"]
    return 1, timestamp, new_record
end
function dedot(tag, timestamp, record)
    local kubernetes_fields = record["kubernetes"]
    if kubernetes_fields == nil then
        return 0, 0, 0
    end
    dedot_keys(kubernetes_fields["annotations"])
    dedot_keys(kubernetes_fields["labels"])
    return 1, timestamp, record
end
function dedot_keys(map)
    if map == nil then
        return
    end
    local new_map = {}
    local changed_keys = {}
    for k, v in pairs(map) do
        local dedotted = string.gsub(k, "%.", "_")
        if dedotted ~= k then
            new_map[dedotted] = v
            changed_keys[k] = true
        end
    end
    for k in pairs(changed_keys) do
        map[k] = nil
    end
    for k, v in pairs(new_map) do
        map[k] = v
    end
end

Fluent Bit configuration looks like:

[PARSER]
    Name        cri
    Format      regex
    Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[FILTER]
    Name         modify
    Match        *
    Rename       log message
[FILTER]
    Name         lua
    Match        kube.*
    script       /fluent-bit/extra/functions.lua
    call         copy_routing_fields_with_underscore
[FILTER]
    Name         lua
    Match        kube.*
    script       /fluent-bit/extra/functions.lua
    call         dedot
[OUTPUT]
    Name            forward
    Match           kube.*
    Host            fluentd-host
    Port            24224
    Shared_Key      fluentd-shared-key

IMHO, Lua scripting doesn't seem to be performant solution. Maybe I need to try Wasm plugin instead. Anyway thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants