Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

Open
yufengzhe1 opened this issue Jul 3, 2023 · 6 comments

Comments

@yufengzhe1
Copy link

how to solve it?

@yufengzhe1
Copy link
Author

Traceback (most recent call last):
File "/data/falcontune-main/falcontune/run.py", line 93, in
main()
File "/data/falcontune-main/falcontune/run.py", line 89, in main
args.func(args)
File "/data/falcontune-main/falcontune/finetune.py", line 162, in finetune
trainer.train()
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 827, in forward
return self.base_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 1070, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 965, in forward
outputs = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 652, in forward
mlp_output + attention_output, residual, self.config.hidden_dropout, training=self.training
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

@yufengzhe1
Copy link
Author

@rmihaylov @rmmihaylov

@clechristophe
Copy link

I am running into the same issue when trying to finetune with LoRA on multiple GPUs. It works well if I apply LoRA only on target_modules = query_key_value but as soon as I want to apply it to other layers I have the same issue.

@rcshubhadeep
Copy link

rcshubhadeep commented Jul 6, 2023

I have a multi GPU setup with A100 40GB and I am getting the same problem. Here is the command I am using -

falcontune finetune --model=falcon-40b --weights=tiiuae/falcon-40b --dataset=./alpaca_data_cleaned.json --data_type=alpaca --lora_out_dir=./falcon-40b-alpaca/ --mbatch_size=1 --batch_size=16 --epochs=3 --lr=3e-4 --cutoff_len=256 --lora_r=8 --lora_alpha=16 --lora_dropout=0.05 --warmup_steps=5 --save_steps=100 --save_total_limit=1 --logging_steps=5 --target_modules='["query_key_value"]'

I have set up the WORLD_SIZE=8 as environment var.

How do we solve this? This is preventing me from using this library to fine tune things.

I tried to run using torchrun as mentioned here the command I tried is the following OMP_NUM_THREADS=8 WORLD_SIZE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64:/usr/lib/x86_64-linux-gnu torchrun --nproc_per_node=8 --master_port=1234 falcontune/run.py finetune --model=falcon-40b --weights=tiiuae/falcon-40b --dataset=./alpaca_data_cleaned.json --data_type=alpaca --lora_out_dir=./falcon-40b-alpaca/ --mbatch_size=1 --batch_size=16 --epochs=3 --lr=3e-4 --cutoff_len=256 --lora_r=8 --lora_alpha=16 --lora_dropout=0.05 --warmup_steps=5 --save_steps=100 --save_total_limit=1 --logging_steps=5 --target_modules='["query_key_value"]'

This throws CUDA OOM error... How can I run it using distributed settings?

Please help

@zepmck
Copy link

zepmck commented Jul 12, 2023

Reduce the bs.

However, is the multi gpu setting working?

@RYANSTOBBE
Copy link

Will multi GPUs work has anyone been able to use this for 2 GPUs I ask because if 40B only requires 40GB of VRAM I would assume but could be wrong that 2x3090s or 2x4090s should work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants