You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your excellent work!
In the process of training, there are 3 steps, the head of first two is linear, and the last one is dpt. I wonder whether it is neccessary to train linear head?
If we train dpt head directly without training linear head, is it right?
The text was updated successfully, but these errors were encountered:
Hi,
from our limited experiments, we got slightly better results by training with the linear head first (I don't have numbers on hand to back it up though).
Thanks for your excellent work!
In the process of training, there are 3 steps, the head of first two is linear, and the last one is dpt. I wonder whether it is neccessary to train linear head?
If we train dpt head directly without training linear head, is it right?
The text was updated successfully, but these errors were encountered: