You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Keras (ops.momentsPyTorch backend, TF backend via tf.nn.moments): Biased batch variance
It's also a bit unclear how to solve this. I guess we want that RF behaves the same for every backend. Case 1 is independent from the backend, so there is no problem. But case 2 and 3 are different, which is a problem, i.e. basically a bug in RF that it is different across backends. But also, when using use_mask=True, you would expect the same behavior as with use_mask=False (regarding whether biased or unbiased variance is used).
In any case, there should be a new option for this, and there should be a meaningful default. I guess we need a new behavior version for a new default here, and for the old behavior version, keep the current (inconsistent) behavior (i.e. the default depends on case and backend).
What would be a meaningful default? If there is a clear answer what is better, than it should be that one, but it's probably not clear. Maybe it actually does not really matter at all. Then just the question remains, which variant is faster. In PyTorch, this fused function torch.nn.functional.batch_norm only supports this one variant, i.e. using unbiased batch variance to update the running variance.
I just skimmed over Batch Renormalization, and this variant sounds to be more meaningful. Not sure if this should be the default but this should definitely be a possible variant in RF. Not sure if this should be an option for our existing rf.BatchNorm or whether this should be a separate implementation.
The text was updated successfully, but these errors were encountered:
There are multiple implementations of batch norm, but here, three different cases are relevant:
use_mask=True
)LayerBase.batch_norm
torch.nn.functional.batch_norm
In case 1 and 2, we use the biased estimate of the batch variance to update the running variance.
In case 3, the unbiased estimate of the batch variance is used to update the running variance. Specifically this is a factor (n-1)/n difference.
It's unclear what is really better. See my question here. What do other frameworks use to update the running variance?
ops.moments
PyTorch backend, TF backend viatf.nn.moments
): Biased batch varianceIt's also a bit unclear how to solve this. I guess we want that RF behaves the same for every backend. Case 1 is independent from the backend, so there is no problem. But case 2 and 3 are different, which is a problem, i.e. basically a bug in RF that it is different across backends. But also, when using
use_mask=True
, you would expect the same behavior as withuse_mask=False
(regarding whether biased or unbiased variance is used).In any case, there should be a new option for this, and there should be a meaningful default. I guess we need a new behavior version for a new default here, and for the old behavior version, keep the current (inconsistent) behavior (i.e. the default depends on case and backend).
What would be a meaningful default? If there is a clear answer what is better, than it should be that one, but it's probably not clear. Maybe it actually does not really matter at all. Then just the question remains, which variant is faster. In PyTorch, this fused function
torch.nn.functional.batch_norm
only supports this one variant, i.e. using unbiased batch variance to update the running variance.I just skimmed over Batch Renormalization, and this variant sounds to be more meaningful. Not sure if this should be the default but this should definitely be a possible variant in RF. Not sure if this should be an option for our existing
rf.BatchNorm
or whether this should be a separate implementation.The text was updated successfully, but these errors were encountered: