Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA memory illegal access with large batch size #27

Open
francescocarzaniga opened this issue Jun 24, 2024 · 2 comments
Open

CUDA memory illegal access with large batch size #27

francescocarzaniga opened this issue Jun 24, 2024 · 2 comments

Comments

@francescocarzaniga
Copy link

I'm trying to adopt this implementation of FFT convolutions inside my model, and initial testing yields great results across the board. Unfortunately, at large batch sizes (>140,000), it crashes with

CUDA Runtime Error at: <red>/flash-fft-conv/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu:386
an illegal memory access was encountered

Here is a MWE:

import torch
from flashfftconv import FlashFFTConv

signal = torch.rand((140000, 4, 4096), dtype=torch.bfloat16, device="cuda")
kernel = torch.rand((4, 4096), dtype=torch.float32, device="cuda")

conv = FlashFFTConv(4096).to("cuda")

res = conv(signal, kernel)
res += 1

This crashes with batch size 140,000 but works with 130,000.

Do you have an idea where this could come from?

@DanFu09
Copy link
Contributor

DanFu09 commented Jun 24, 2024 via email

@francescocarzaniga
Copy link
Author

Wonderful! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants