Kernel fusion for Llama-v2 #538
TejaGollapudi
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
https://twitter.com/pommedeterre33/status/1681935636129873920?t=VaxYpkbwNLKxly7icie8kw&s=19
I came across this great thread showing the benefits of kernel fusion for speeding up LLama-2 up to 1.8x using OpenAI's Triton kernels. (It may work with torch kernel fusions too).
Not sure if this would be beneficial for vLLM but it might be worth taking a look at 😄
Beta Was this translation helpful? Give feedback.
All reactions