You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We are experiencing occasional restarts of Fluent Bit pods running as a DaemonSet in our EKS cluster. The pods are restarting with an exit code of 139 (segmentation fault). According to our Prometheus metrics, the issue is not caused by a running out of memory nor CPU usage.
Logs
[2024/10/11 07:19:28] [engine] caught signal (SIGSEGV)
#0 0x562fdf2a5df9 in flb_log_event_encoder_dynamic_field_flush_scopes() at src/flb_log_event_encoder_dyn0
#1 0x562fdf2a5df9 in flb_log_event_encoder_dynamic_field_reset() at src/flb_log_event_encoder_dynamic_fi
#2 0x562fdf2a3d5c in flb_log_event_encoder_reset() at src/flb_log_event_encoder.c:33
#3 0x562fdf2d30cf in ml_stream_buffer_flush() at plugins/in_tail/tail_file.c:418
#4 0x562fdf2d30cf in ml_flush_callback() at plugins/in_tail/tail_file.c:919
#5 0x562fdf288927 in flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1515
#6 0x562fdf289085 in flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#7 0x562fdf2a6dcc in flb_ml_stream_id_destroy_all() at src/multiline/flb_ml_stream.c:316
#8 0x562fdf2d385c in flb_tail_file_remove() at plugins/in_tail/tail_file.c:1249
#9 0x562fdf2cf5b5 in tail_fs_event() at plugins/in_tail/tail_fs_inotify.c:242
#10 0x562fdf2588e4 in flb_input_collector_fd() at src/flb_input.c:1949
#11 0x562fdf2726d7 in flb_engine_handle_event() at src/flb_engine.c:575
#12 0x562fdf2726d7 in flb_engine_start() at src/flb_engine.c:941
#13 0x562fdf24e1a3 in flb_lib_worker() at src/flb_lib.c:674
#14 0x7f7f630f2ea6 in ???() at ???:0
#15 0x7f7f629a6a6e in ???() at ???:0
#16 0xffffffffffffffff in ???() at ???:0
Environment
Fluent Bit Version: version=3.0.6, commit=9af65e2c36 Note we already update to version=3.1.9, commit=431fa79ae2 and we have same issue.
Kubernetes Version: v1.29.0
EKS Version: v1.29.0-eks-680e576
Node Operating System: Bottlerocket OS 1.21.1 (aws-k8s-1.29) kernel 6.1.102
Container Runtime: containerd://1.7.20+bottlerocket Node Configuration:
CPU: 4 vCPU
Memory: 8GB
Instance Type: c6a.xlarge
Deployment in EKS
Fluent Bit is deployed as a Daemon Set in an EKS cluster.
Resource limits and requests are set for memory and CPU.
resources:
limits:
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
[SERVICE]
Daemon Off
Flush 1
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Path /var/log/containers/*.log
# Exclude fluent-bit logs, certain error conditions can cause loops
# that can effectively DoS outputs with very high logging rates
# (see https://github.com/fluent/fluent-bit/issues/3829)
Exclude_Path /var/log/containers/fluent-bit-*_kube-system_*.log
multiline.parser docker, cri
Tag kube.<namespace_name>.<pod_name>.<container_name>-<container_id>
Mem_Buf_Limit 5MB
Skip_Long_Lines On
DB /var/log/flb_pods_tail.db
Tag_Regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<container_id>[a-z0-9]{64})\.log$
[INPUT]
Name tail
Path /usr/share/reactshost/*/ReactsLogs/Metrics/*/*.json
Tag reacts-metrics
Parser reacts-metrics-parser
Path_Key filename
DB /usr/share/reactshost/fluentbit/logs.db
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
Kube_Tag_Prefix kube.
Regex_Parser kubePodCustom
[FILTER]
Name rewrite_tag
Match kube.*
Rule $kubernetes['pod_id'] ^.*4.*$ cw.$TAG true
Emitter_Name cw_re_emitted
[FILTER]
Name grep
Match cw.*
Exclude $kubernetes['labels']['logging.cloudwatch.aws/enabled'] false
[FILTER]
Name grep
Match kube.*
Exclude $kubernetes['namespace_name'] loki-system
[FILTER]
Name modify
Match kube.*
Rename level level_label
Rename instance instance_label
[FILTER]
Name parser
Match reacts-metrics
Key_Name filename
Parser filename-parser
Reserve_Data On
[OUTPUT]
Name loki
Match kube.*
Host loki-gateway.loki-system
Port 80
labels job=fluentbit, type=logs, namespace=$kubernetes['namespace_name'], component=$kubernetes['container_name'], level=$level_label, instance=$instance_label
[OUTPUT]
Name loki
Match reacts-metrics
Host loki-gateway.loki-system
Port 80
Labels job=fluentbit, component=$component, instance=$instance, type=metrics
Bug Report
Description
We are experiencing occasional restarts of Fluent Bit pods running as a DaemonSet in our EKS cluster. The pods are restarting with an exit code of 139 (segmentation fault). According to our Prometheus metrics, the issue is not caused by a running out of memory nor CPU usage.
Logs
Environment
Fluent Bit Version: version=3.0.6, commit=9af65e2c36
Note we already update to version=3.1.9, commit=431fa79ae2 and we have same issue.
Kubernetes Version: v1.29.0
EKS Version: v1.29.0-eks-680e576
Node Operating System: Bottlerocket OS 1.21.1 (aws-k8s-1.29) kernel 6.1.102
Container Runtime: containerd://1.7.20+bottlerocket
Node Configuration:
CPU: 4 vCPU
Memory: 8GB
Instance Type: c6a.xlarge
Deployment in EKS
Fluent Bit is deployed as a Daemon Set in an EKS cluster.
Resource limits and requests are set for memory and CPU.
Additional context
Attached you find log files and fluentbit configs.
fleuntbitlog.txt
custom_parser.txt
fluent-bit.txt
The text was updated successfully, but these errors were encountered: