Some questions about FP16 training #1354
-
Hi! I see that mmediting is support for fp16 training , how can I use it? |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
Some codes are ready, but we haven't fully implemented this feature. you need to modify some source codes and possibly solve some potential bugs. Currently, mmediting support fp16 based on To enable fp16, you need to modify the Previously, there is a PR #320, but it is some how out-of-dated. |
Beta Was this translation helpful? Give feedback.
-
I turn fp16_enabled to True,but nothing happened. The training gpu memory did not become small. |
Beta Was this translation helpful? Give feedback.
-
Oh, my mistake. Depending on whether you are using distributed or non-distributed training, you need to register the hook around line 179 or 304 in |
Beta Was this translation helpful? Give feedback.
-
https://github.com/open-mmlab/mmdetection/blob/98949809b7179fab9391663ee5a4ab5978425f90/mmdet/apis/train.py#L153 |
Beta Was this translation helpful? Give feedback.
-
It's a dictionary like At current stage, maybe mmdetection is a good reference. A good trick is to search some keywords in the repo. |
Beta Was this translation helpful? Give feedback.
-
Hello. I have tried to use fp16 but find the loss nan problem after around 300~500 iterations. I have tried to use loss_scale=512/128/64/32 but all of them didn't work. I have also tried gradient clip by the way. Do you have any ideas about how to solve this problem? |
Beta Was this translation helpful? Give feedback.
-
@LeoXing1996 |
Beta Was this translation helpful? Give feedback.
-
Hey @Shuweis and @ArchipLab-LinfengZhang , currently MMEdit 1.x has already supported auto-fp16 training, and you are welcome to have a try. |
Beta Was this translation helpful? Give feedback.
Hey @Shuweis and @ArchipLab-LinfengZhang , currently MMEdit 1.x has already supported auto-fp16 training, and you are welcome to have a try.
If you still have any problems with FP16 training, please paste your configs and training log, and then we can help you better.