RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

yufengzhe1 · 2023-07-03T07:02:22Z

how to solve it?

yufengzhe1 · 2023-07-03T07:17:11Z

Traceback (most recent call last):
File "/data/falcontune-main/falcontune/run.py", line 93, in
main()
File "/data/falcontune-main/falcontune/run.py", line 89, in main
args.func(args)
File "/data/falcontune-main/falcontune/finetune.py", line 162, in finetune
trainer.train()
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 827, in forward
return self.base_model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 1070, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 965, in forward
outputs = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/falcontune-main/falcontune/model/falcon/model.py", line 652, in forward
mlp_output + attention_output, residual, self.config.hidden_dropout, training=self.training
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

yufengzhe1 · 2023-07-03T08:46:26Z

@rmihaylov @rmmihaylov

clechristophe · 2023-07-04T13:34:34Z

I am running into the same issue when trying to finetune with LoRA on multiple GPUs. It works well if I apply LoRA only on target_modules = query_key_value but as soon as I want to apply it to other layers I have the same issue.

rcshubhadeep · 2023-07-06T03:50:53Z

I have a multi GPU setup with A100 40GB and I am getting the same problem. Here is the command I am using -

falcontune finetune --model=falcon-40b --weights=tiiuae/falcon-40b --dataset=./alpaca_data_cleaned.json --data_type=alpaca --lora_out_dir=./falcon-40b-alpaca/ --mbatch_size=1 --batch_size=16 --epochs=3 --lr=3e-4 --cutoff_len=256 --lora_r=8 --lora_alpha=16 --lora_dropout=0.05 --warmup_steps=5 --save_steps=100 --save_total_limit=1 --logging_steps=5 --target_modules='["query_key_value"]'

I have set up the WORLD_SIZE=8 as environment var.

How do we solve this? This is preventing me from using this library to fine tune things.

I tried to run using torchrun as mentioned here the command I tried is the following OMP_NUM_THREADS=8 WORLD_SIZE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64:/usr/lib/x86_64-linux-gnu torchrun --nproc_per_node=8 --master_port=1234 falcontune/run.py finetune --model=falcon-40b --weights=tiiuae/falcon-40b --dataset=./alpaca_data_cleaned.json --data_type=alpaca --lora_out_dir=./falcon-40b-alpaca/ --mbatch_size=1 --batch_size=16 --epochs=3 --lr=3e-4 --cutoff_len=256 --lora_r=8 --lora_alpha=16 --lora_dropout=0.05 --warmup_steps=5 --save_steps=100 --save_total_limit=1 --logging_steps=5 --target_modules='["query_key_value"]'

This throws CUDA OOM error... How can I run it using distributed settings?

Please help

zepmck · 2023-07-12T14:57:09Z

Reduce the bs.

However, is the multi gpu setting working?

RYANSTOBBE · 2023-07-19T22:15:47Z

Will multi GPUs work has anyone been able to use this for 2 GPUs I ask because if 40B only requires 40GB of VRAM I would assume but could be wrong that 2x3090s or 2x4090s should work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

yufengzhe1 commented Jul 3, 2023

yufengzhe1 commented Jul 3, 2023

yufengzhe1 commented Jul 3, 2023

clechristophe commented Jul 4, 2023

rcshubhadeep commented Jul 6, 2023 •

edited

Loading

zepmck commented Jul 12, 2023

RYANSTOBBE commented Jul 19, 2023

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #30

Comments

yufengzhe1 commented Jul 3, 2023

yufengzhe1 commented Jul 3, 2023

yufengzhe1 commented Jul 3, 2023

clechristophe commented Jul 4, 2023

rcshubhadeep commented Jul 6, 2023 • edited Loading

zepmck commented Jul 12, 2023

RYANSTOBBE commented Jul 19, 2023

rcshubhadeep commented Jul 6, 2023 •

edited

Loading