Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate get error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! #12

Open
631068264 opened this issue Jun 5, 2023 · 4 comments

Comments

@631068264
Copy link

Run

falcontune generate \
    --interactive \
    --model falcon-7b \
    --weights tiiuae/falcon-7b \
    --lora_apply_dir falcon-7b-alpaca \
    --max_new_tokens 500 \
    --use_cache \
    --do_sample \
    --instruction "列出人工智能的五个可能应用"

Error

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 2/2 [00:42<00:00, 21.04s/it]
Device map for lora: auto
falcon-7b-alpaca loaded
/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/data/home/yaokj5/anaconda3/envs/falcon/bin/falcontune", line 33, in <module>
    sys.exit(load_entry_point('falcontune==0.1.0', 'console_scripts', 'falcontune')())
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/run.py", line 88, in main
    args.func(args)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/generate.py", line 71, in generate
    generated_ids = model.generate(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/generate.py", line 27, in autocast_generate
    return self.model.non_autocast_generate(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
    return self.sample(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
    outputs = self(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 1072, in forward
    transformer_outputs = self.transformer(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 967, in forward
    outputs = block(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 722, in forward
    mlp_output += attention_output
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
@MohamedAliRashad
Copy link

I have the same problem
@rmihaylov

@gpravi
Copy link

gpravi commented Jun 9, 2023

Running into the same issue. Any luck on this?

@631068264
Copy link
Author

631068264 commented Jun 9, 2023

export CUDA_VISIBLE_DEVICES=1 can work

@gpravi
Copy link

gpravi commented Jun 9, 2023

Thanks. Seems to be working.

But ran into a OutOfMemory issue post fine tuning

File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 336, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory.

Not sure what knob to adjust

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants