problem about batch_size and gradient_accumulation_steps #218

ShowLo · 2024-11-05T08:43:13Z

In the paper, it's show that MuseTalk training was performed on 2 NVIDIA H20 GPUs, and the Unet model was initially trained with L1 and perceptual losses for 200,000 steps. However, the paper doesn't specify the batch_size and gradient_accumulation_steps, which impact training speed. Could you provide the specific numbers used?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem about batch_size and gradient_accumulation_steps #218

problem about batch_size and gradient_accumulation_steps #218

ShowLo commented Nov 5, 2024 •

edited

Loading

problem about batch_size and gradient_accumulation_steps #218

problem about batch_size and gradient_accumulation_steps #218

Comments

ShowLo commented Nov 5, 2024 • edited Loading

ShowLo commented Nov 5, 2024 •

edited

Loading