09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96

briansp2020 · 2023-09-18T04:57:30Z

Hi,
I'm not sure where to start so I'm just posting here hoping that someone with more knowledge could help me out. I'm trying to run these notebooks on my system with 7900XTX and it is running very slow. The code that uses resnet26d seems ok but the code that uses convnext_small_in22k is very slow. I also tried convnext_small to make it use the model from torchvision but that seems to run just as slowly.

I first thought that ROCm pytorch has not optimized the model yet. But I found out that pytorch microbenchmark (https://github.com/ROCmSoftwarePlatform/pytorch-micro-benchmarking) actually shows 7900XTX running faster when using torchvision model.

(pt) root@rocm:~/pytorch-micro-benchmarking# python3 micro_benchmarking_pytorch.py --network convnext_small
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 2.088879442214966
Throughput [img/sec] : 30.638436429886497

Running the same test on my 3080ti gives

(pt) bsp2020@Ryzen5950X:~/pytorch-micro-benchmarking$ python3 micro_benchmarking_pytorch.py --network convnext_small
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 18.948059797286987
Throughput [img/sec] : 3.3776545295241056

Could anyone please help me figure out what is going on? Any help would be appreciated.

briansp2020 · 2023-10-07T05:11:39Z

I just tried pytorch nightly build with rocm 5.7.1 (not released yet) and it now runs much faster. PyTorch microbenchmark has improved so much as well.

(pt) root@rocm:~/pytorch-micro-benchmarking# python micro_benchmarking_pytorch.py --network convnext_small
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.43914319276809693
Throughput [img/sec] : 145.73834014500406

fp16 performance is still very poor compared to 3080ti

(pt) root@rocm:~/pytorch-micro-benchmarking# python micro_benchmarking_pytorch.py --network convnext_small --fp16 1
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP16
Mini batch size [img] : 64
Time per mini-batch : 0.2943673968315125
Throughput [img/sec] : 217.41538189649373

fp16 microbenchmark is still far slower than 3080ti (558 img/sec) but the fastai example runs slightly faster than 3080ti now.

Good job AMD!
I'm looking forward to official 7900XTX support!

Not closing the issue yet since I believe the performance can still be improved further.

briansp2020 · 2023-10-07T13:26:35Z

It seems like the performance improvement came with the recent update to pytorch and not the library that came with 5.7.1. I reran the same code with 5.7 environment the performance improvement is still there.

The code I'm using is here. This example code has 2 examples from fastai quickstart examples. The first section uses the convnext_small model that just saw a huge improvement. The second section is a text-processing example that has not seen any speed improvement. Could you, whoever just released the huge speed improvement, also take a look at the text processing as well?

I'm not sure if anyone is reading this or whether my issue report was the one that brought the 7900XTX performance issue to the developer's attention. But I'm so excited I just had to post this :)

briansp2020 mentioned this issue Oct 7, 2023

ROCM5.7 build pytorch failed evshiron/rocm_lab#14

Closed

briansp2020 mentioned this issue Oct 25, 2023

MI100 performance ROCm/rocWMMA#289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96

09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96

briansp2020 commented Sep 18, 2023

briansp2020 commented Oct 7, 2023

briansp2020 commented Oct 7, 2023

09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96

09-small-models-road-to-the-top-part-2 runs very slow on AMD Radeon 7900XTX using ROCm PyTorch #96

Comments

briansp2020 commented Sep 18, 2023

briansp2020 commented Oct 7, 2023

briansp2020 commented Oct 7, 2023