Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 #6

Open
JCBrouwer opened this issue Mar 14, 2023 · 4 comments
Open

RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 #6

JCBrouwer opened this issue Mar 14, 2023 · 4 comments

Comments

@JCBrouwer
Copy link

Hello, thanks for the interesting research and open source repo!

I'm trying to integrate the HyenaOperator (with default settings) in a sequence modeling task and am running into the error in the title when using the fftconv extension.

My sequence (u in the trace below) has the shape (batch=10, channels=32, seq_len=8760) which apparently leads to an fft_size of 32768.

  File ".../hyena.py", line 31, in fftconv_fused
    return fftconv_func(u, k, D, gelu=False, force_fp16_output=torch.is_autocast_enabled())
  File ".../extensions/fftconv/fftconv.py", line 175, in fftconv_func
    return FFTConvFunc.apply(
  File ".../extensions/fftconv/fftconv.py", line 98, in forward
    out = fftconv_fwd(
RuntimeError: Expected fft_size >= 16 && fft_size <= 16384 && (fft_size == 1 << int(log2(float(fft_size)))) to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

Is the maximum supported sequence length 8192? Is this a theoretical / hardware limitation? Or just of the current implementation? Would it be possible to support longer sequences?

Thanks!

@DanFu09
Copy link
Contributor

DanFu09 commented Mar 14, 2023 via email

@calclavia
Copy link

@DanFu09 I also noticed that the fftconv extension here doesn't seem to reach the speed gains as claimed in the paper (it does give memory savings though!)

@DanFu09
Copy link
Contributor

DanFu09 commented Apr 1, 2023

Can you give more details on the workload you’re using to measure the speedup?

@calclavia
Copy link

@DanFu09 It's a regular Transformer with self-attention layers replaced with Hyena, with FFTConv. The overall training time per step doesn't seem to decrease when switching between the cuFFT Pytorch implementation and this extension. It might be dominated by other layers. Sequence length ~1K.

Let me know if there are any specific details you're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants