Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive RAM usage. #977

Open
elephantpanda opened this issue Oct 14, 2024 · 0 comments
Open

Excessive RAM usage. #977

elephantpanda opened this issue Oct 14, 2024 · 0 comments
Assignees
Labels

Comments

@elephantpanda
Copy link

I have noticed that even in DML mode, genai uses a lot of RAM.

I am not sure why it keeps a copy of the model both in RAM and in VRAM.

Previously I have used onnxruntime and was able to bind all the weights only to VRAM without using any RAM at all.

In CPU mode it is even worse with it using so much RAM it is not really usable.

If possible it would be nice that once it loads the model onto the GPU it clears up the RAM. Or even better, it doesn't load all the model into the RAM at the same time but streams it onto the GPU in a more memory efficient way.

Please be aware that most users have limited RAM and that the less memory it uses the better.

Having said that, the new onnxruntime library does seem to be a bit more efficient in avoiding memory spikes. But I think the RAM usage could still be improved greatly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants