Wrong implementation about EMA ? #1

YilinLiu97 · 2019-02-13T23:16:11Z

Hi @perone, I found that the teacher model's weights seem to be not updated as it performed as bad as it was first initialized.

def update_ema_variables(model, ema_model, alpha, global_step):

  alpha = min(1 - 1 / (global_step + 1), alpha)
  for ema_param, param in zip(ema_model.parameters(), model.parameters()):
  ema_param.data.mul_(alpha).add_(1 - alpha, param.data)

About this

line:ema_param.data.mul_(alpha).add_(1 - alpha, param.data),

shouldn't this be ema_param.data.mul_(alpha).add_((1 - alpha)*param.data) ??
Thanks!

The text was updated successfully, but these errors were encountered:

perone · 2019-02-14T00:16:23Z

Hi @YilinLiu97, why do you think it's wrong ? Can you provide a concrete case where it would be wrong ? For me our implementation of the updating rule is definitely correct.

YilinLiu97 · 2019-02-14T00:45:33Z

Sorry, No offense. I could be wrong, but these are the parameters printed out during training, and it does not seem to follow the rule: alpha*teacher_param + (1-alpha)*student_param.

I think that line should be ema_param.data.mul_(alpha).add_((1 - alpha)*param.data). instead of ema_param.data.mul_(alpha).add_(1 - alpha, param.data). Maybe there was a typo.

('t_p: ', Parameter containing:
tensor([ 0.0007, -0.0006, 0.0046, -0.0033, 0.0004, 0.0262, 0.0153, -0.0259,
-0.0115, -0.0015, -0.0117, -0.0060, 0.0161, 0.0104, 0.0080, -0.0015,
-0.0116, -0.0160, 0.0247, -0.0227, 0.0077, 0.0052, 0.0217, 0.0111,
-0.0036, -0.0176, -0.0188, 0.0026, -0.0163, 0.0155],
device='cuda:0'))
('p: ', Parameter containing:
tensor([-0.0322, -0.0153, 0.0206, -0.0212, -0.0274, 0.0293, 0.0225, -0.0279,
-0.0272, -0.0282, -0.0272, -0.0261, 0.0275, 0.0261, 0.0274, -0.0251,
0.0014, -0.0285, 0.0296, -0.0296, 0.0105, -0.0209, 0.0123, 0.0227,
-0.0162, -0.0081, -0.0079, -0.0233, -0.0145, 0.0030],
device='cuda:0', requires_grad=True))
('(after) t_p: ', Parameter containing:
tensor([ 0.0007, -0.0006, 0.0046, -0.0033, 0.0004, 0.0262, 0.0153, -0.0259,
-0.0115, -0.0016, -0.0117, -0.0060, 0.0161, 0.0104, 0.0080, -0.0015,
-0.0116, -0.0160, 0.0247, -0.0227, 0.0077, 0.0052, 0.0217, 0.0111,
-0.0036, -0.0176, -0.0187, 0.0026, -0.0163, 0.0155],
device='cuda:0'))

YilinLiu97 · 2019-02-14T00:47:51Z

t_p - teacher's weights before being updated
p: student's weights
(after) t_p - teacher's weights after being updated

YilinLiu97 · 2019-02-14T00:54:25Z

I used alpha=0.5 for this test.

perone · 2019-02-15T03:25:04Z

Hi @YilinLiu97, no problem at all. Let me do an example that might make it clear:

import torch
alpha = 0.99
params = torch.randn(100, 100)
ema_params = torch.randn(100, 100)
ema_params_2 = ema_params.clone()

# Case 1
tensor_a = ema_params.data.mul_(alpha).add_(1-alpha, params.data)

# Case 2
tensor_b = ema_params_2.data.mul_(alpha).add((1-alpha)*params.data)

torch.allclose(tensor_a, tensor_b)

If you execute that, you'll see that they will give exact the same results and the result of allclose will be True.

YilinLiu97 · 2019-02-18T04:30:40Z

Thanks very much for the detailed explanation! Yeah you are right. I forgot that I 'deep copied' from the model's weights to the teacher's weights at the very begining so with an alpha of 0.5 the parameters indeed looked super similar.

After some extensive experiments, I found that the key to domain adaptation of this experiment seems to be the use of group normalization(?) because of the discrepancy of running_mean/std in source and target domain. Without it the ema model doesn't seem to work at all. I also found that with GN it's hard to tune the hyperparameters. The results were worse than using BN even in a supervised setting.

Theoretically, if alpha = 0 and the batches of source data were forwarded to the model only, batches of target data were forwarded to the teacher only, this is equivalent to AdaBN, right? I've also tried this but still the ema model doesn't give reasonable results. Do you possibly know why? Thanks!

perone · 2019-03-01T21:12:17Z

Hi @YilinLiu97, sorry for the late reply. This depends a lot on the task and requirements for the task. You can actually use BatchNorm but you need to be careful with the internal estimated parameters because you're dealing with two different domains, AdaBN is one way to help with that. However, in many cases when you have large inputs, you need to use small batches because of memory and for small batches, BatchNorm underperforms Group Norm, so it depends a lot on the task.
I would recommend you to go through the recommendations in the bottom of this page where they give some very good advice that helped us to get good results: https://github.com/CuriousAI/mean-teacher.

KingMV · 2021-06-12T04:25:36Z

Mean-Teacher only uses model.parameters() to update EMA model but model.parameters() do not contain the stats of BatchNorm. You can check it by printing the parameters of BatchNorm in model and EMA model.

YilinLiu97 changed the title ~~About EMA~~ Wrong implementation about EMA ? Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong implementation about EMA ? #1

Wrong implementation about EMA ? #1

YilinLiu97 commented Feb 13, 2019 •

edited

Loading

perone commented Feb 14, 2019 •

edited

Loading

YilinLiu97 commented Feb 14, 2019 •

edited

Loading

YilinLiu97 commented Feb 14, 2019

YilinLiu97 commented Feb 14, 2019

perone commented Feb 15, 2019

YilinLiu97 commented Feb 18, 2019 •

edited

Loading

perone commented Mar 1, 2019

KingMV commented Jun 12, 2021

Wrong implementation about EMA ? #1

Wrong implementation about EMA ? #1

Comments

YilinLiu97 commented Feb 13, 2019 • edited Loading

perone commented Feb 14, 2019 • edited Loading

YilinLiu97 commented Feb 14, 2019 • edited Loading

YilinLiu97 commented Feb 14, 2019

YilinLiu97 commented Feb 14, 2019

perone commented Feb 15, 2019

YilinLiu97 commented Feb 18, 2019 • edited Loading

perone commented Mar 1, 2019

KingMV commented Jun 12, 2021

YilinLiu97 commented Feb 13, 2019 •

edited

Loading

perone commented Feb 14, 2019 •

edited

Loading

YilinLiu97 commented Feb 14, 2019 •

edited

Loading

YilinLiu97 commented Feb 18, 2019 •

edited

Loading