-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we add a Termbase/glossary to the MT engine #68
Comments
This is something which I have been working on for quite some time. I have experimended with glossary support in a development version of OPUS-CAT, but the problem is that it requires models that have been specifically trained for utilizing glossaries. Training those models takes time, and before starting it, I want to make sure the glossary functionality works as it should, and that it doesn't degrade translation quality in other ways. Btw., in the scenario we mention, the Client would not pass into the translation memory, since the modified source segment is only used internally in OPUS-CAT, the CAT tool will store the original source segment. |
Hello Tommi So does that mean that I can use pre-editing as a substitute for a TB??? Regards Dave Neve (SafeTex) |
Pre-edit rules can be used to inject term translations in the source text (in which case they work as a kind of TB substitute), and often the NMT will carry over the term translation to the target text. But this is not the behavior the MT models have been trained to replicate, so it's going to be hit and miss. The planned terminology support in OPUS-CAT will use a similar method of injecting term translations into the source text, but in that case the models will be directly trained to transfer the term translation from source to target. Also, I suspect that using terms with MT is always going to require some amount of manual work beyond just selecting a TB: one thing I've noticed when working on the term support is that TBs provided to translators are not well suited for MT as such, since they contain too many terms, many of which are overlapping. So term support for MT generally seems to require more carefully managed TBs. |
Hello Tommi and all
We've touched on this indirectly before.
For example, I have a term "Beställaren" the must always be translated as "the Client" but I get lots of variation such as "the Client", "the Customer", "the Mandator", "the Undertaker", "the Contracting Party" , etc.
I'm reluctant to write a pre-editing rule as I don't want "the Client" in the source segments of the TM(X)
I could write a post-editing rule but some of the above terms appear legitimately in other segments, in particular "the Customer" (the Client has Customers !).
So unless I've misunderstood or missed something, I think I need to be able to add a Termbase/Glossary that simply tells Opus to translate "Beställaren" as "the Client"
On paper, this looks easy - Beställaren = the Client - and rather necessary.
But this feature does not exist at present?
Do you think such a feature will be added at some point?
The text was updated successfully, but these errors were encountered: