Restored masked scaled dot attention #2542

l-k-11235 · 2023-12-29T15:19:50Z

Revert changes inserted by #2538 and adapt to #2539.

vince62s · 2023-12-29T15:30:54Z

For the sake of clarity:
when using left-padding at inference with LM models, we need a mask even for step > 0 in order to mask the padded positions on the left hand side of the batch. However pytorch SDPA does not accept a mask for non causal, hence we need to use this mask only for the "regular" scaled dot product attention manual calculation.
merging.

restored masked scaled dot attention

5579e4b

vince62s merged commit 05cd7cd into OpenNMT:master Dec 29, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restored masked scaled dot attention #2542

Restored masked scaled dot attention #2542

l-k-11235 commented Dec 29, 2023

vince62s commented Dec 29, 2023

Restored masked scaled dot attention #2542

Restored masked scaled dot attention #2542

Conversation

l-k-11235 commented Dec 29, 2023

vince62s commented Dec 29, 2023