distributed init updates (addition of non-meta-tensor/rank0-broadcast path) #674

gsethi523 · 2024-08-09T02:31:59Z

Summary:
add non-meta tensor initialization path to distributed init.

in this path, the model is materialized by all processes on the cpu (duplicated—resulting in proportionally increased cpu memory requirements) instead of just by rank 0. afterwards, all modules/buffers/params are iterated over and sharded as previously; however, instead of rank0 first broadcasting its tensor data to all other meta-tensor ranks, all ranks now simply copy their own cpu-init'd data onto their corresponding gpu and continue.

Reviewed By: xiecong

Differential Revision: D57022318

… path) Summary: add non-meta tensor initialization path to distributed init. in this path, the model is materialized by all processes on the cpu (duplicated—resulting in proportionally increased cpu memory requirements) instead of just by rank 0. afterwards, all modules/buffers/params are iterated over and sharded as previously; however, instead of rank0 first broadcasting its tensor data to all other meta-tensor ranks, all ranks now simply copy their own cpu-init'd data onto their corresponding gpu and continue. Reviewed By: xiecong Differential Revision: D57022318

facebook-github-bot · 2024-08-09T02:32:17Z

This pull request was exported from Phabricator. Differential Revision: D57022318

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2024

facebook-github-bot added the fb-exported label Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed init updates (addition of non-meta-tensor/rank0-broadcast path) #674

distributed init updates (addition of non-meta-tensor/rank0-broadcast path) #674

gsethi523 commented Aug 9, 2024

facebook-github-bot commented Aug 9, 2024

distributed init updates (addition of non-meta-tensor/rank0-broadcast path) #674

Are you sure you want to change the base?

distributed init updates (addition of non-meta-tensor/rank0-broadcast path) #674

Conversation

gsethi523 commented Aug 9, 2024

facebook-github-bot commented Aug 9, 2024