You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.35 GiB total capacity; 77.18 GiB already allocated; 57.19 MiB free; 77.97 GiB reserved in total by PyTorch)
#19
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 336, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory.
Any ideas to fix this?
The text was updated successfully, but these errors were encountered:
This happened to me as well with the 40B model. This error only occurs when trying to save a checkpoint. Tried to save the model after all steps instead of every 50 steps - still got the error.
Thanks @angelovAlex . Applied the patch in that issue. Now running into a similar issue at a different line
File "/root/.conda/envs/falcontune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1815, in state_dict
self._save_to_state_dict(destination, prefix, keep_vars)
File "/root/.conda/envs/falcontune/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 330, in _save_to_state_dict
weight_clone = self.weight.data.clone()
^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.35 GiB total capacity; 75.73 GiB already allocated; 127.19 MiB free; 77.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Ran into CUDA OOM issue during fine tuning
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 336, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory.
Any ideas to fix this?
The text was updated successfully, but these errors were encountered: