You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, just FYI in case you didn't know, apparently Huggingface changed the flag/parameter or what not when trying to specify flash attention 2. Here's the message I got:
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
BTW, I tried using the newer attn_implementation="flash_attention_2" with Bark and COULD NOT get it to work...yet with your program that uses the old use_flash_attention_2=Trueit works. I don't know if it was my script or the different flags....but just be aware in case.
The text was updated successfully, but these errors were encountered:
Hello, just FYI in case you didn't know, apparently Huggingface changed the flag/parameter or what not when trying to specify flash attention 2. Here's the message I got:
And here's the script I am testing:
BTW, I tried using the newer
attn_implementation="flash_attention_2"
with Bark and COULD NOT get it to work...yet with your program that uses the olduse_flash_attention_2=True
it works. I don't know if it was my script or the different flags....but just be aware in case.The text was updated successfully, but these errors were encountered: