RWKV only show lower GPU memory occupancy when inference? #250

thucz · 2024-07-21T10:27:46Z

I tried to use RWKV(e.g., Vision-RWKV) in CV tasks. But I found RWKV shows similar GPU memory occupancy to full-attention Transformer (like ViT) when training. I found both RWKV and Vision-RWKV only present their inference memory occupancy in the paper.

The high memory consume is not friendly for my tasks. Do you have any advice?

BlinkDL · 2024-07-21T14:42:53Z

Hi may I know your ctxlen

thucz · 2024-07-21T15:57:31Z

ctx_len is 8192

BlinkDL · 2024-09-08T19:40:16Z

Please check whether attention/rwkv is your bottleneck

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RWKV only show lower GPU memory occupancy when inference? #250

RWKV only show lower GPU memory occupancy when inference? #250

thucz commented Jul 21, 2024 •

edited

Loading

BlinkDL commented Jul 21, 2024

thucz commented Jul 21, 2024

BlinkDL commented Sep 8, 2024

RWKV only show lower GPU memory occupancy when inference? #250

RWKV only show lower GPU memory occupancy when inference? #250

Comments

thucz commented Jul 21, 2024 • edited Loading

BlinkDL commented Jul 21, 2024

thucz commented Jul 21, 2024

BlinkDL commented Sep 8, 2024

thucz commented Jul 21, 2024 •

edited

Loading