-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ROCM 6 #82
Comments
I am also trying to reproduce the build by using the provided dockerfiles, but I always get errors:
|
Did you try building by setting the XLA revision as in #63 (comment)? Setting up the right environment for building was an issue before, that's why we have the Dockerfile. I don't know about ROCM 6, my best bet would be on updating to newer XLA could fix the build, but that usually involves changes to EXLA too. I think it would be a good idea to update sometime soon anyway, but no guarantees. You could perhaps use Docker with 5.6 for computations/experimentation altogether, though I get it's not very convenient. |
@jalberto I updated to the latest XLA revision and EXLA main already uses that. I tried building with ROCm 5.7, but there were errors indicating that XLA already assumes 6.0 (using symbols defined in 6.0+). So I updated the Docker image and managed to successfully build with ROCm 6.0. Please try |
thanks, @jonatanklosko will test and report back |
@jonatanklosko sorry for the delay, now I have a different error:
|
@jalberto is it when loading the precompiled binary or during build? |
@jonatanklosko in case it helps:
|
As a sanity check, try without |
yes, that worked as expected, no issues As a side note: I have same issues building with the new dockerfile |
I see, I have no idea where this LLVM error is coming from, I didn't find |
Not sure if this is completely related, but I'm trying to get ROCm 6 working too and built xla with the Dockerized build.sh script which gave me a tarball. When I set Just wanted to ask if this is a known problem, or if someone has some pointers for debugging this? And are there plans to provide pre-built ROCm packages like for CUDA? |
@monorkin it may not be related, but the only thing I can think of is to also set
I've just added support for
Not at the moment. The ROCm support is somewhat experimental, in the sense that we don't have the capacity to test it on every release and maintain possibly multiple precompiled builds. Jax (the Python library using XLA) also considers it experimental. This may change in the future, depending on how the ROCm prominence evolves upstream. |
@jonatanklosko that did the trick! Thank you! Now I have a different problem where after creating a serving the runtime crashes. I added Is there a way to increase the verbosity? Or another way to check why the runtime crashed? UPDATE: |
It seems ROCM 5.6 kind of works, but it really requires too much back and forth to have everything working, the new Fedora 40 brings official ROCM support but starting in ROCM 6.
I am using this config from #63
I managed to find every pkgs it was asking for (this took a while of back and forth) until I reached this:
My guess is
xla_extension
needs to be built for rocm 7 (librocblas.s0.4), I tried to build it myself but the requirements are too way off the current system (gcc versions and so on)Will be great if there were official xla binaries for different ROCM versions, as there are for CUDA.
I understand ROCM support is in low priority, but it is really nice for start in AI as it works nicely in linux
The text was updated successfully, but these errors were encountered: