[DO NOT REVIEW] gaps to enable FDSP2 cpu offloading #622

weifengpy · 2024-10-16T17:55:11Z

command: CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh

resolve #620

this PR runs FSDP2 cpu offload. there are 2 hacks / gaps we should resolve

freqs_cis does not belong to model.parameters(). This PR moves it to cuda device since FSDP2 cannot manager it. Another option is to manage it as model.parameters()
grad clipping does not work for cpu tensors (DTensor dispatch + all_reduce on cpu)

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy · 2024-10-16T18:04:46Z

this task can be helpful for ramp up @mori360

[DO NOT REVIEW] gaps to enable FDSP2 cpu offloading

5861e28

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2024

Provide feedback