Control over batch-norm running_mean/var buffers #767

adefazio · 2024-05-30T20:01:25Z

Control over batch-norm running_mean/var buffers

Following up on the request in the recent working group meeting regarding future improvements to the challenge, it would be extremely useful if we had control over the running_mean/var buffers of batch-norm layers. Currently, if different iterates are used for evaluation and training (i.e. EMA or Schedule-Free averaging is used) then the running_mean/var values will be incorrect as they average over the training iterates.

This, together with the eval() support requested in #758 would make it much easier to implement averaging approaches.

In terms of control, it would be useful to turn on/off the updating of the running mean/var during forward passes, and to directly access their values. Currently there is a update_batch_norm switch that calls update_batch_norm_fn in pytorch_utils, but it doesn't allow us to update the batchnorm stats when in eval mode (eval mode changes the behavior of dropout, so we want to be in eval mode when updating BN statistics right before a model evaluation).

Also, having the batch-norm running-mean/var directly provided in the API would give a model-agnostic way to access them, currently we would need to loop over all modules and check if they are Pytorch Batch norm or the custom ConformerBatchNorm & DeepspeechBatchNorm layers.

A third point is rules clarity around batchnorm layers. Are we freely allowed to change the batch-norm momentum during training (which allows us to freeze the running stats, reset them, and otherwise change the speed they are updated), as well as the running-mean/var buffers?

priyakasimbeg · 2024-09-03T17:50:44Z

We're planning on discussing feature requests like these in the benchmark code on Thursday 9/5 during the WG meeting.

adefazio · 2024-09-05T15:40:29Z

I've created a PR #783

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control over batch-norm running_mean/var buffers #767

Control over batch-norm running_mean/var buffers #767

adefazio commented May 30, 2024

priyakasimbeg commented Sep 3, 2024

adefazio commented Sep 5, 2024

Control over batch-norm running_mean/var buffers #767

Control over batch-norm running_mean/var buffers #767

Comments

adefazio commented May 30, 2024