Runtime determination and load balancing across multiple GPUs #42

ozcanmiraay · 2024-10-20T00:32:44Z

Hello,

I'm struggling to understand a couple of things about the simulator, given there's no documentation around it.

The simulation runtime is determined based on the requests (either trace files or synthetic request generator with arrival times determined with Poisson distribution). There is also execution time, which is the actual time it takes to process a batch or stage of computations during model inference. My question is, how does the existence of multiple GPUs (added through replica_config_num_pipeline_stages and replica_config_tensor_parallel_size parameters) affect simulation runtime and/or request execution time? It seems like (from the stats extractor script) the GPU hours are calculated by runtime * number of GPUs / 3600; however, I'm thinking that the runtime or the execution time should become less in the presence of multiple GPUs, thus the total GPU hours, due to load balancing. Is this incorrect?

Also, where in the code should I look to find out how load balancing is handled across multiple GPUs? Is there a load-balancing configuration across multiple GPUs, or are GPUs fully independent from each other? I am just curious to understand how the tasks are distributed across GPUs when we increase the world_size by increasing replica_config_num_pipeline_stages and replica_config_tensor_parallel_size parameters.

Finally, how are the batches determined, and are batches allocated to a specific GPU? I am asking this because it looks like the MFU metric (based on utils/mfu_calculator.py) is calculated batch-by-batch, and I am curious if it outputs the MFU over all the GPUs or a specific GPU that each batch is assigned to.

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime determination and load balancing across multiple GPUs #42

Runtime determination and load balancing across multiple GPUs #42

ozcanmiraay commented Oct 20, 2024

Runtime determination and load balancing across multiple GPUs #42

Runtime determination and load balancing across multiple GPUs #42

Comments

ozcanmiraay commented Oct 20, 2024