-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96
Comments
I just tried pytorch nightly build with rocm 5.7.1 (not released yet) and it now runs much faster. PyTorch microbenchmark has improved so much as well.
fp16 performance is still very poor compared to 3080ti
fp16 microbenchmark is still far slower than 3080ti (558 img/sec) but the fastai example runs slightly faster than 3080ti now. Good job AMD! Not closing the issue yet since I believe the performance can still be improved further. |
It seems like the performance improvement came with the recent update to pytorch and not the library that came with 5.7.1. I reran the same code with 5.7 environment the performance improvement is still there. The code I'm using is here. This example code has 2 examples from fastai quickstart examples. The first section uses the convnext_small model that just saw a huge improvement. The second section is a text-processing example that has not seen any speed improvement. Could you, whoever just released the huge speed improvement, also take a look at the text processing as well? I'm not sure if anyone is reading this or whether my issue report was the one that brought the 7900XTX performance issue to the developer's attention. But I'm so excited I just had to post this :) |
Hi,
I'm not sure where to start so I'm just posting here hoping that someone with more knowledge could help me out. I'm trying to run these notebooks on my system with 7900XTX and it is running very slow. The code that uses resnet26d seems ok but the code that uses convnext_small_in22k is very slow. I also tried convnext_small to make it use the model from torchvision but that seems to run just as slowly.
I first thought that ROCm pytorch has not optimized the model yet. But I found out that pytorch microbenchmark (https://github.com/ROCmSoftwarePlatform/pytorch-micro-benchmarking) actually shows 7900XTX running faster when using torchvision model.
Running the same test on my 3080ti gives
Could anyone please help me figure out what is going on? Any help would be appreciated.
The text was updated successfully, but these errors were encountered: