[newrelic-logging] Default resource limits cause out of memory errors #1500

hero-david · 2024-10-08T21:01:44Z

Description

An issue has been opened about this before, and the reporter was instructed to ensure that they had upgraded their chart such that memory limit config on the input was present.

helm-charts/charts/newrelic-logging/values.yaml

Line 104 in ab2d1ba

Mem_Buf_Limit 7MB

We have been struggling with OOM errors and restarts on our pods despite having this config present, and upping the memory allowances of the pod. We have about 50 pods per node.

The helm config provided for this was:

newrelic-logging:
  enabled: true
  fluentBit:
    criEnabled: true
  lowDataMode: false
  resources:
    limits:
      memory: 256Mi
  tolerations:
  - effect: NoSchedule
    key: role
    operator: Exists

Date	Message
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1360652 (flb-pipeline) total-vm:1307336kB, anon-rss:259736kB, file-rss:19648kB, shmem-rss:0kB, UID:0 pgtables:1104kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1400772 (fluent-bit) total-vm:1311176kB, anon-rss:259508kB, file-rss:19084kB, shmem-rss:0kB, UID:0 pgtables:1028kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1400790 (flb-pipeline) total-vm:1311176kB, anon-rss:259652kB, file-rss:19468kB, shmem-rss:0kB, UID:0 pgtables:1028kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1360626 (fluent-bit) total-vm:1307336kB, anon-rss:259624kB, file-rss:19264kB, shmem-rss:0kB, UID:0 pgtables:1104kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1201131 (flb-pipeline) total-vm:1483464kB, anon-rss:259504kB, file-rss:19828kB, shmem-rss:0kB, UID:0 pgtables:1324kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1201113 (fluent-bit) total-vm:1483464kB, anon-rss:259392kB, file-rss:19444kB, shmem-rss:0kB, UID:0 pgtables:1324kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1266468 (flb-pipeline) total-vm:1487560kB, anon-rss:259188kB, file-rss:19628kB, shmem-rss:0kB, UID:0 pgtables:1344kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1324063 (fluent-bit) total-vm:1487560kB, anon-rss:259368kB, file-rss:19368kB, shmem-rss:0kB, UID:0 pgtables:1348kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1324081 (flb-pipeline) total-vm:1487560kB, anon-rss:259476kB, file-rss:19752kB, shmem-rss:0kB, UID:0 pgtables:1348kB oom_score_adj:996
2024-10-08 05:11:23	Memory cgroup out of memory: Killed process 1266420 (fluent-bit) total-vm:1487560kB, anon-rss:259084kB, file-rss:19244kB, shmem-rss:0kB, UID:0 pgtables:1344kB oom_score_adj:996

Versions

Helm v3.14.4
Kubernetes (AKS) 1.29.2
Chart: nri-bundle-5.0.81
FluentBit: newrelic/newrelic-fluentbit-output:2.0.0

What happened?

The fluentbit pods were repeatedly killed for using more memory than it's limit, which is set very low. It's CPU was never highly utilised, which does not suggest that the memory increase was due to throttling / not being able to keep up.

What you expected to happen?

The fluentbit should have little to no restarts, and it should never reach 1.5GB of memory used per container.

How to reproduce it?

Using the same versions as listed above, and the same helm values.yaml, deploy an AKS cluster with 50 production workloads per node (2vcpu 8gb) and observe whether there are memory issues.

workato-integration · 2024-10-08T21:01:48Z

https://new-relic.atlassian.net/browse/NR-323574

hero-david added bug Categorizes issue or PR as related to a bug. triage/pending Issue or PR is pending for triage and prioritization. labels Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[newrelic-logging] Default resource limits cause out of memory errors #1500

[newrelic-logging] Default resource limits cause out of memory errors #1500

hero-david commented Oct 8, 2024

workato-integration bot commented Oct 8, 2024

[newrelic-logging] Default resource limits cause out of memory errors #1500

[newrelic-logging] Default resource limits cause out of memory errors #1500

Comments

hero-david commented Oct 8, 2024

Description

Versions

workato-integration bot commented Oct 8, 2024