Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava-llama3-8b 微调过程中 loss nan #942

Open
liboaccn opened this issue Oct 6, 2024 · 1 comment
Open

llava-llama3-8b 微调过程中 loss nan #942

liboaccn opened this issue Oct 6, 2024 · 1 comment

Comments

@liboaccn
Copy link

liboaccn commented Oct 6, 2024

截屏2024-10-06 14 22 48

微调llava-llama3-8b的时候 从几个step后就开始loss=nan了 这个可能是什么原因呢?我看github issue也有人遇到类似问题 官方回复是改lr 我现在设置的

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 32*4
dataloader_num_workers = 32
max_epochs = 1
optim_type = AdamW
lr = 2e-6



param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]
@liboaccn
Copy link
Author

liboaccn commented Oct 6, 2024

补充,修改过 clip->siglip


image_processor = dict(
    type=SiglipImageProcessor.from_pretrained,
    pretrained_model_name_or_path=visual_encoder_name_or_path,
    trust_remote_code=True)

model = dict(
    type=LLaVAModel,
    freeze_llm=True,
    freeze_visual_encoder=True,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=llm_name_or_path,
        trust_remote_code=True),
    visual_encoder=dict(
        type=SiglipVisionModel.from_pretrained,
        pretrained_model_name_or_path=visual_encoder_name_or_path))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant