Fix AMD GPUs not being detected #7147

max-maag · 2024-10-18T14:25:30Z

Summary

Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation.¹ Torch 2.4.1 does not appear to be available for ROCm 6.2.

The specified CUDA version of 12.4 is still correct according to ¹ so it does need to be changed.

Related Issues / Discussions

Closes #7006
Closes #7146

QA Instructions

Install Invoke 5.1.1 or later with ROCm support using the installer on a system with an AMD GPU.
Start the server.
Generate any image.

Without this fix, the CPU is used to generate images. This can be seen in the log output. Image generation also takes forever.

I did not test the changes to the Dockerfile since I am not familiar with Docker.

Merge Plan

n/a

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)

https://pytorch.org/get-started/previous-versions/#v241 ↩ ↩²

ebr · 2024-10-18T20:40:47Z

We don't have any way of testing this on a 6xxx architecture, but this seems like a worthwhile and correct change nonetheless. We do know that 7xxx (Navi32/33 / RDNA3) chips will need some more special handling, but that shouldn't block this update.

@max-maag - we also have this index url set in the Dockerfile - could you please update that in this PR?
Thanks for your contribution!

max-maag · 2024-10-19T00:32:01Z

we also have this index url set in the Dockerfile - could you please update that in this PR? Thanks for your contribution!

I saw that someone reported that in #7006 but because I'm not using Docker and am not familiar with it at all I kept it out of this PR until now. I changed the URL in the Docker file and added the relevant issue to this PR's related issue list.

I didn't test the Dockerfile change though. I don't see any reason why it shouldn't work but maybe should verify the fix just to be sure.

ebr · 2024-10-20T07:09:13Z

Approved - LGTM from my perspective. Thanks again for the contribution.

Just FYI - I finally got it to generate on a recent AMD GPU (W7900). Here's a full write-up: https://gist.github.com/ebr/e4e4118b603bd95bfd2408ee30c27f0a. It's not pretty, but it works.

Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241

max-maag requested review from lstein, ebr and hipsterusername as code owners October 18, 2024 14:25

github-actions bot added the installer PRs that change the installer label Oct 18, 2024

max-maag force-pushed the fix/incompatible-torch-rocm-versions branch from 9b9632e to e3a7e5f Compare October 19, 2024 00:28

max-maag requested a review from blessedcoolant as a code owner October 19, 2024 00:28

github-actions bot added the docker label Oct 19, 2024

max-maag force-pushed the fix/incompatible-torch-rocm-versions branch from e3a7e5f to d8b0730 Compare October 19, 2024 00:33

ebr approved these changes Oct 20, 2024

View reviewed changes

hipsterusername force-pushed the fix/incompatible-torch-rocm-versions branch from d8b0730 to b41762c Compare October 20, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix AMD GPUs not being detected #7147

Fix AMD GPUs not being detected #7147

max-maag commented Oct 18, 2024 •

edited

Loading

ebr commented Oct 18, 2024

max-maag commented Oct 19, 2024 •

edited

Loading

ebr commented Oct 20, 2024

Fix AMD GPUs not being detected #7147

Are you sure you want to change the base?

Fix AMD GPUs not being detected #7147

Conversation

max-maag commented Oct 18, 2024 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Footnotes

ebr commented Oct 18, 2024

max-maag commented Oct 19, 2024 • edited Loading

ebr commented Oct 20, 2024

max-maag commented Oct 18, 2024 •

edited

Loading

max-maag commented Oct 19, 2024 •

edited

Loading