Replies: 2 comments
-
I was able to work around this. In additional to making sure batch size and gradient accumulation were adjusted downward, I also needed to adjust max sequence length down, which I did increase for llama 3.2 because that's one of the new features we were interested in. |
Beta Was this translation helpful? Give feedback.
-
Hey @troy256 - sorry for taking so long to get round to this. I think you're right that the main culprit here is maximum sequence length, as the llama3.2 model you're using still has a 130K sequence length - it was going to be my first suggestion. It seems like the samples in your dataset are pretty long, in this case, if they're using sufficient memory as to OOM on an A100? You might not need to reduce your batch size/use gradient accumulation so much if you've appropriately constrained your sequence length. One thing I'd also suggest is to use sample packing if there's variance in sequence length between samples in your dataset (https://pytorch.org/torchtune/stable/tutorials/datasets.html#sample-packing). This could help speed things up for you. |
Beta Was this translation helpful? Give feedback.
-
I've been fine tuning successfully with torchtune on llama 3.1-8b and am now trying to do the same with llama 3.2-3b. I upgraded to the latest nightly build of torchtune and am getting this out of memory error during fine tuning. System has an Nvidia A100 with 80 GB.
What I've tried: Smaller batch sizes, setting the env variable recommended in the error message, upgrading to latest torchtune nightly
Any suggestions would be appreciated.
tune run lora_finetune_single_device --config /data/torchtune/tune-recipes/llama3.2-3b-LoRA.yaml
Output:
Beta Was this translation helpful? Give feedback.
All reactions