-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4卡4090微调SVD 训练过程无报错提前保存模型退出 #78
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
按照多卡配置准备几十条数据集尝试微调SVD,模型正常训练但在固定轮次保存模型退出,无任何报错。请求大神帮助解惑,以下是环境列表和具体的训练过程:
环境列表:
deepspeed训练配置yaml:
训练过程开始段,存在overflow溢出INFO:
训练过程结尾段:
The text was updated successfully, but these errors were encountered: