Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformer: Missing dropout after the self-attention #245

Open
albertz opened this issue Nov 13, 2022 · 2 comments
Open

Conformer: Missing dropout after the self-attention #245

albertz opened this issue Nov 13, 2022 · 2 comments
Milestone

Comments

@albertz
Copy link
Member

albertz commented Nov 13, 2022

I noticed that we do not have dropout after the self-attention:

    # MHSA
    x_mhsa_ln = self.self_att_layer_norm(x_ffn1_out)
    x_mhsa = self.self_att(x_mhsa_ln, axis=spatial_dim)
    x_mhsa_out = x_mhsa + x_ffn1_out

This is different to the standard Transformer.
This is also different to the paper.

Originally posted by @albertz in #233 (comment)

@albertz albertz added this to the first-release milestone Nov 13, 2022
@albertz
Copy link
Member Author

albertz commented Nov 13, 2022

This is clearly wrong. I wonder how we should deal with such cases. I have lots of experiments now with the old code, so changing the behavior is very inconvenient for me, to properly track the difference. On the other side, we said that changing the behavior is still ok now, and better now than later esp in a case like this here.

Introducing a special option for this might simplify it a bit but is very ugly.

albertz added a commit that referenced this issue Nov 13, 2022
albertz added a commit to rwth-i6/i6_experiments that referenced this issue Nov 13, 2022
@albertz
Copy link
Member Author

albertz commented Nov 13, 2022

Ok I temporarily introduced the flag use_dropout_after_self_att. It's really only temporarily. It's default is already the correct variant. It's also not really meant to be for normal use, so not a normal argument but a global class attrib instead. My current experiments set this but I try to make all new experiments without this, and then later we can just remove the flag.

albertz added a commit to rwth-i6/i6_experiments that referenced this issue Nov 13, 2022
Explicitly without use_dropout_after_self_att.

rwth-i6/returnn_common#245
Atticus1806 pushed a commit to rwth-i6/i6_experiments that referenced this issue Nov 17, 2022
Atticus1806 pushed a commit to rwth-i6/i6_experiments that referenced this issue Nov 17, 2022
Explicitly without use_dropout_after_self_att.

rwth-i6/returnn_common#245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant