Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37

Open
kexve opened this issue Jun 4, 2024 · 1 comment

Comments

@kexve
Copy link

kexve commented Jun 4, 2024

Dear Developers,
When I execute the attn using the GPU 4090 with default parameters, I encounter the issue "too many resources requested for launch."
image
I have discovered that the "Registers per block" for the 4090 machine is 65,536, and each thread uses 162 registers. This results in a BLOCK_SIZE that cannot exceed 65,536 / 162 ≈ 404.
image
image
Have any of you faced this issue, and do you have any solutions?

@kexve kexve changed the title [4090 attn] cudaCheckError(): too many resources requested for launch [bug report][4090 attn] cudaCheckError(): too many resources requested for launch Jun 4, 2024
@ahepp
Copy link

ahepp commented Jul 19, 2024

I ran into this as well. Thanks for writing this up, I would have had no idea how to debug this otherwise. I notice that harness.impl defines BLOCK_SIZE as 32 * NUM_WORKERS, which is defined in 4090_ker.cu as 16. I was able to compile and run the kernel after setting NUM_WORKERS to 8. I don't know whether floor(404/32) = 12 would be a better value, I figured I'd leave it as a power of 2 just in case that was important.

I also just ran this without a data file, just using whatever new gave me for k/q/v/o_ref, so maybe it would fail with real data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants