Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于RUN_CUDA_RWKV6这部分,最好用pytorch实现,否则不方便移植 #252

Open
bobo-wmdigit opened this issue Aug 27, 2024 · 5 comments

Comments

@bobo-wmdigit
Copy link

看了下论文的方向,挺棒的,但是整个设计对实际想进一步研究的人非常不友好,因为想用这个框架的,都是希望移植到边缘端,可是核心代码,用的又是cuda实现的,移植起来非常麻烦,还要自己手动对齐,好像除了1代都是这么干的?
我也去测试了demo,感觉对终止符的推荐也不是很好,建议这么好的理论框架,最好能够设计的更方便大家去实验,才有机会被真正落地用起来。
仅供参考。

@BlinkDL
Copy link
Owner

BlinkDL commented Aug 29, 2024

谢谢关注,推理不需要cuda(虽然有cuda会prefill更快): https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v6_demo.py

以及这是聊天demo(用\n\n作为终止符,因为我会将用户输入内容中的\n\n全部替换为\n)
https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py

@BlinkDL
Copy link
Owner

BlinkDL commented Sep 5, 2024

另外请看 https://github.com/TorchRWKV/rwkv-kit

@ustbzgn
Copy link

ustbzgn commented Sep 20, 2024

@uniartisan
Copy link

uniartisan commented Oct 18, 2024

联名上书,希望官方大佬复现Pytorch版本,https://github.com/TorchRWKV/rwkv-kit这个仓库复现的需要加载预训练模型的权重,希望官方复现一个能从头训练的pytorch版本,这样才能有正真和transformer对抗的基础生态

hi, This repo is currently support by me, and I'm currently working with RWKV team. Therefore you could treat it as an official version.

The whole model is still in Pytorch, except for wkv kernel. If you take consider of transformer, attention kernel is wrote by cuda/c in Pytorch or in triton. It's same for both RWKV and Transformer, because if we use native torch to achieve the same computation, it will be really slow, about 50x timers more. Because if you look into torch's eager mode, it will launch about 10000 more small kernels when a 4096 prefill are being made. It's necessary to write a fused function in CUDA or Triton.

And you can move forward rwkv-fla for more details. Thank you!

By the way, rwkv-kit can initialize rwkv 0x60 from scratch.
https://github.com/TorchRWKV/rwkv-kit/blob/dev/rwkvkit/utils/rwkv6.py#L543

@uniartisan
Copy link

看了下论文的方向,挺棒的,但是整个设计对实际想进一步研究的人非常不友好,因为想用这个框架的,都是希望移植到边缘端,可是核心代码,用的又是cuda实现的,移植起来非常麻烦,还要自己手动对齐,好像除了1代都是这么干的? 我也去测试了demo,感觉对终止符的推荐也不是很好,建议这么好的理论框架,最好能够设计的更方便大家去实验,才有机会被真正落地用起来。 仅供参考。

You can take consider of rwkv.cpp/llama.cpp, we also provide onnx and pure torch code. https://github.com/TorchRWKV/flash-linear-attention/blob/main/fla/ops/rwkv6/recurrent_naive.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants