-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug report][4090 attn] cudaCheckError(): too many resources requested for launch #37
Comments
I ran into this as well. Thanks for writing this up, I would have had no idea how to debug this otherwise. I notice that harness.impl defines BLOCK_SIZE as 32 * NUM_WORKERS, which is defined in 4090_ker.cu as 16. I was able to compile and run the kernel after setting NUM_WORKERS to 8. I don't know whether floor(404/32) = 12 would be a better value, I figured I'd leave it as a power of 2 just in case that was important. I also just ran this without a data file, just using whatever |
Dear Developers,
When I execute the attn using the GPU 4090 with default parameters, I encounter the issue "too many resources requested for launch."
I have discovered that the "Registers per block" for the 4090 machine is 65,536, and each thread uses 162 registers. This results in a BLOCK_SIZE that cannot exceed 65,536 / 162 ≈ 404.
Have any of you faced this issue, and do you have any solutions?
The text was updated successfully, but these errors were encountered: