AWS Fargate Memory slowly increasing/leak over time #6006

NevinDry · 2024-10-15T15:05:32Z

Observed behavior

We observe that our aws fargate containers serving clustered NATS have their memory usage increasing over time :

(metric : ecs.fargate.mem.usage)

What is weird is we can observe this behavior on our STG1 environement but not on our DEV environment.
These two environments have no significant activity, and are deployed the same way through IaC. Nats is deployed using clustering on AWS fargate.

The difference in memory usage between the two envs is very significant :
dev:

stg1:

If we look at the NATS memory metrics, both environements are stable :
dev:

stg1:

There must be a leak somewhere but we are unable identify it.

Expected behavior

The fargate containers memory shouldn't be increasing over time and should stay stable.
It should follow the NATS memory metrics and stay stable.

Server and client version

Server version : 2.10.18
Go : go1.22.5

Host environment

NATS clustering inside AWS fargate without jetstream.

Operating system/Architecture
Linux/X86_64
CPU | Memory
2 vCPU | 4 GB
Platform version
1.4.0
Launch type
FARGATE

Log-router and Datadog as side containers.

Steps to reproduce

No response

neilalexander · 2024-10-16T09:18:23Z

Can you please report the output of free -m within the containers when the memory usage is high?

NevinDry · 2024-10-16T14:41:52Z

Thanks for your answer @neilalexander. Our stg1 containers restarted yesterday so the memory has not increased much yet. I will keep you updated when the memory is high.
Here is the free command output in stg1 ATM (env where the memory increase over time) :

STG1
CPU | Memory
1 vCPU | 2 GB
NODE
/ # free -h
total used free shared buff/cache available
Mem: 3.6G 565.7M 292.2M 540.0K 2.8G 2.8G
Swap: 0 0 0

SEED
CPU | Memory
1 vCPU | 2 GB
/ # free -h
total used free shared buff/cache available
Mem: 3.8G 580.9M 275.3M 540.0K 2.9G 2.9G
Swap: 0 0 0

On dev, where the memory is stable, here is a free command output (note that the CPU/Memory provisioning is not the same, does this could have an impact ?) :
DEV

NODE
CPU | Memory
.25 vCPU | .5 GB
/ # free -h
total used free shared buff/cache available
Mem: 927.8M 547.5M 67.0M 540.0K 313.3M 240.1M
Swap: 0 0 0

SEED
CPU | Memory
.25 vCPU | .5 GB
/ # free -h
total used free shared buff/cache available
Mem: 927.8M 559.9M 80.8M 544.0K 287.1M 229.2M
Swap: 0 0 0

(note that the total/used memory for both env are higher than the provisioned memory we set on our containers.

Thank you for your help.

neilalexander · 2024-10-16T15:14:44Z

What stands out to me is the buff/cache utilisation, which makes me think you're falling victim to kubernetes/kubernetes#43916. In short, Kubernetes is considering the kernel page cache when deciding whether a pod is under memory pressure. I suspect if you look at the RSS size (as is reported by nats server ls for example) that you'd see the process utilisation itself is stable.

Do you set both a memory request and a memory limit, or just one or the other?

NevinDry · 2024-10-17T08:57:19Z

Hi @neilalexander, we did not have memory soft/hardlimit set on our fargate containers. We are going to configure it and see what happens, I will keep you updated.
On another note, we observed that only containers with more than default memory value (512M) have their memory increasing over time.
Thanks for your guidance !

NevinDry added the defect Suspected defect such as a bug or regression label Oct 15, 2024

NevinDry changed the title ~~AWS Fargate Memory slowly increasing over time~~ AWS Fargate Memory slowly increasing/leak over time Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Fargate Memory slowly increasing/leak over time #6006

AWS Fargate Memory slowly increasing/leak over time #6006

NevinDry commented Oct 15, 2024 •

edited

Loading

neilalexander commented Oct 16, 2024

NevinDry commented Oct 16, 2024 •

edited

Loading

neilalexander commented Oct 16, 2024

NevinDry commented Oct 17, 2024

AWS Fargate Memory slowly increasing/leak over time #6006

AWS Fargate Memory slowly increasing/leak over time #6006

Comments

NevinDry commented Oct 15, 2024 • edited Loading

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

neilalexander commented Oct 16, 2024

NevinDry commented Oct 16, 2024 • edited Loading

neilalexander commented Oct 16, 2024

NevinDry commented Oct 17, 2024

NevinDry commented Oct 15, 2024 •

edited

Loading

NevinDry commented Oct 16, 2024 •

edited

Loading