-
Notifications
You must be signed in to change notification settings - Fork 7
ROCM5.7 build pytorch failed #14
Comments
Is ROCm 5.7 released? Btw, the source code of PyTorch used in this repo comes from: I am not sure if the repo you use will work or not. |
yes it's released but not announce i use this repo https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing because it's have many optimization for 5.7 but it always build failed |
Interesting! I'll give it a try too. |
This issue should be caused by the outdated hipify script, you can fix it by running: git checkout torch/csrc/jit/ir/ir.h after UPDATE: Additional While it compiles pretty well, linking is failed for some |
I think 5.7 is still under testing. I had issues with the 5.7 version of the kernel when I tried it the other day. Hopefully, they will fix it before they officially release it. |
I built https://github.com/pytorch/pytorch against ROCm 5.7 just now and it succeeded. I suspect that the |
https://github.com/AUTOMATIC1111/stable-diffusion-webui failed to work with the PyTorch built upon the main repo, but https://github.com/vladmandic/automatic worked fine, although I didn't see a performance difference. |
Have you tried 5.7 amdgpu-dkms? It causes issues for me. But I'm not sure whether it's my setup or a problem with dkms modules since I seem to have a problematic setup. When I was running TF in a VM, AI benchmark would hang when running test 14. Using my current setup, it would crash X windows and log me out. |
I haven't tried running ROCm in a virtualized environment yet. I did not call I do think that ROCm 5.7 might be a bit strange because |
i find this branch from pytorch pr, but now it's closed |
i tried 5.7 amdgpu-dkms only ,it works fine |
I'm trying out 5.7 and started getting the following messages in dmesg.
Does anyone know what this means? It's in red. So, it makes me feel nervous. :( |
ROCm5.7 officially released,and I can't wait to see if there's any implement compared with ROCm5.6 |
Unfortunately, I am on a business trip and don't have the chance to test them out. With PyTorch built above, I didn't see a performance improvement compared to ROCm 5.6. We should re-evaluate it when the official builds come out. I have looked through the changelog of ROCm 5.7 and it seems to mainly focus on new features, with less emphasis on performance optimizations related to AI. The list I am interested in currently:
I haven't been testing Triton for ROCm for a long time, and I am curious about its performance and if the Fused Attention example works on Navi 3x or not. |
I ran a TF benchmark (https://pypi.org/project/new-ai-benchmark/) again and got Device AI Score: 37824 which is about 10% better than ROCm 5.6. Looking at the details, inferencing seems better than my NVidia 3080ti but training seems really bad. A simple benchmark from https://cprimozic.net/notes/posts/machine-learning-benchmarks-on-the-7900-xtx/ still is really bad compared to 3080ti I have not been able to run any pytorch benchmark yet. Does anyone know a simple pytorch benchmark that runs on pytorch & ROCm? So far, the issue I noticed when trying out fastai (ROCm/pytorch#1276 (comment)) is still there. |
I was able to run pytorch micro-benchmark from ROCm project. See ROCm/pytorch#1276 I built pytorch & torchvision on my machine before running the benchmark and do now know if they run with the ROCm 5.6 nightly build or not. |
In ROCM 5.6, I find best performance pytorch is in the rocm-pytorch docker . Waiting for this docker https://hub.docker.com/r/rocm/pytorch update. |
Does anyone know how to build torch audio? I get errors when I try to build it. torch and torchvision build worked fine for me. It seems rocm version detection does not seem to work quite right... Full output at https://gist.github.com/briansp2020/484f639aa59ccb308f35cf9ad6542881
|
https://github.com/evshiron/rocm_lab/blob/master/scripts/build_torchaudio.sh#L25 The following code might work: echo 5.7.0-63 > /opt/rocm/.info/version-dev Otherwise, you might want to locate that file from some of the official Docker images (probably https://hub.docker.com/r/rocm/dev-ubuntu-20.04). |
That helped with detecting ROCm version. But it still can't find rocrand. |
I build the master branch pytorch got some perfmance improve rocm5.6 rocm5.7 ref rtx3090 running benchmark for frameworks ['pytorch'] |
Nano GPT benchmark 3090 ngc 23.06 Infer type is torch.bfloat16 Infer type is torch.float32 7900xtx rocm-5.6.1 Infer type is torch.bfloat16 Infer type is torch.float32 7900xtx rocm-5.7 Infer type is torch.float16 Infer type is torch.bfloat16 Infer type is torch.float32 |
Stable Diffusion Control Net Pipeline worked! Compared to 3080ti I have, it's still about 20% slower. But the code works unmodified. I wonder how much more performance they can squeeze out of 7900. 7900XTX does about 4.84it/s Does anyone know of a way to measure the performance in absolute flops/bandwith/etc using pytorch and/or tensorflow? Edited to add performance numbers. |
It should be caused by the changes to CMake files in ROCm 5.7. As I don't have a chance to try it at the moment, my recommendation is to dig into |
I built the Rocm Software Platform pytorch 2.0.1 it's perfmance well at some net work like convnext_large it's seems can update the rocm_lab repo and it built same as your repo
|
I noticed a massive performance improvement using the latest pytorch nightly build. fastai/course22#96 (comment) Is anyone else interested in trying out the latest pytorch build and report if they see any performance improvements? The improvements in convenext_small benchmark is massive and results in fastai example performance of 7900XTX catching up to 3080ti performance. Before, it was 4 to 5 times slower. Sure 7900XTX should be even faster. But I'm so glad to see that AMD consumer graphics card support is coming along nicely! |
Seems that FP32 performance improves a lot,hope that FP16 performance will be improved in the next ROCM6.0 |
Does anyone have an idea how fast 7900XTX should be when all optimization is in place? I'm wondering whether "python micro_benchmarking_pytorch.py --network convnext_small --fp16 1" performance is about as good as it will get. Looking at the raw fp16 numbers, 7900XTX is said to have 123 TFLOPs (here) vs 3080ti's 272 TFLOPs tensorcore performance (here). Ratio of measured performance seems about right.... Also, unlike Nvidia, AMD does not list different numbers for fp16 vector peak performance vs matrix engine peak performance. Does anyone know whether 123 TFLOPs is from the vector engine or the matrix engine? |
Based on my limited understanding, it is possible that RDNA 3 may not have processing units similar to Tensor Cores, and WMMA may just be an optimized instruction. In comparison, CDNA has its own Matrix Cores and XDL instructions. Here is an article that may help in understanding the differences between GPUs from these vendors: I am not a professional in this field, so as a consumer, I am satisfied as long as the RX 7900 XTX is comparable to the likes of RTX 4080 or RTX 3090 in terms of AI applications that consumers will use. |
3080ti only 75t fp16 performance, wmma is likely tensorcore but it's not useful like tensorcore because it’s memory bound here is my test in 7900xtx fp16 gemm
if you aware of fp16 precsion performance may be need focus the wmma op develop |
I re-ran a TF benchmark (https://pypi.org/project/new-ai-benchmark/) again and got Device AI Score: 40996 which is about 8% better than what I got before and is now better than 3080ti. |
In my previous replies, I mentioned that I felt ROCm 5.7 was not working properly on my end, so I have been sticking with ROCm 5.6 for the time being. Today, I tried updating to ROCm 5.7.1, but the situation did not improve: text and images disappear in Google Chrome, and running AI applications easily result in GPU resets. There are many reset logs in Therefore, I would like to ask if your ROCm 5.7 is working fine out of the box? Have you encountered the issues I mentioned above? And if so, how did you solve them? |
@evshiron |
I am currently using Ubuntu 22.04.1 with Linux kernel 6.2.0. As you mentioned, it is possible that the kernel version could be the reason. Anyway, ROCm 5.6 is working fine on my end and PyTorch now distributes their stable version for ROCm 5.6, so I might stick with this version for a longer time. I haven't used SHARK yet, but I think SHARK for AMD is a Vulkan thing. |
I use ROCmSoftwarePlatform pytorch Repositories to build the latest pytorch-rocm and it's fail .
use script command in rocm_lab
error log is
The text was updated successfully, but these errors were encountered: