You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello !
As per title, when trying to use a model (here, opus+bt-2021-04-30, multilingual) to translate a sentence (e.g. : "【测试】哎呀?我的台本哪里去了?我现在应该说啥?") to English, I noticed some characters (【 and 】) are modified even in the input tab.
【测试】哎呀?我的台本哪里去了?我现在应该说啥?
... becomes :
[测试]哎呀?我的台本哪里去了?我现在应该说啥?
So I suppose there's no way the translation is going to be exact. And indeed, it outputs :
[Test] Oh? Where is my script? What should I say now?
Is it possible to fix this so, for example, 【 and 】 are not transformed before generation ?
Thanks !
The text was updated successfully, but these errors were encountered:
It seems the lenticular brackets are converted to standard brackets by the OPUS MT model preprocessing, so the model just treats them as standard square brackets. This is probably an error, since at least according to Wikipedia, the lenticular brackets denote headings etc., i.e. they are not equivalent to standard brackets.
This can be fixed only by retraining the models, so I'll make a note of this and hopefully we can modify the preprocessing script for the next training run.
Hello !
As per title, when trying to use a model (here, opus+bt-2021-04-30, multilingual) to translate a sentence (e.g. : "【测试】哎呀?我的台本哪里去了?我现在应该说啥?") to English, I noticed some characters (【 and 】) are modified even in the input tab.
【测试】哎呀?我的台本哪里去了?我现在应该说啥?
... becomes :
[测试]哎呀?我的台本哪里去了?我现在应该说啥?
So I suppose there's no way the translation is going to be exact. And indeed, it outputs :
[Test] Oh? Where is my script? What should I say now?
Is it possible to fix this so, for example, 【 and 】 are not transformed before generation ?
Thanks !
The text was updated successfully, but these errors were encountered: