You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is just a question of whether there are such plans or not...
Right now, Flashinfer lib requires Q (query) and KV (kv-cache) to have the same dtype.
Just an example from the code, q and paged_kv have the same DTypeIn:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is just a question of whether there are such plans or not...
Right now, Flashinfer lib requires Q (query) and KV (kv-cache) to have the same dtype.
Just an example from the code,
q
andpaged_kv
have the sameDTypeIn
:Are there any plans to support different dtypes for KV-cache and Q (query)?
My personal interest is
fp8
for kv-cache andfp16
for query.Thank you in advance!
cc @yzh119
Beta Was this translation helpful? Give feedback.
All reactions