-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] UDPIPE - Use ISO language codes #1030
Conversation
c83c8a6
to
058e521
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1030 +/- ##
==========================================
+ Coverage 82.16% 82.46% +0.30%
==========================================
Files 92 92
Lines 12283 12340 +57
Branches 1670 1683 +13
==========================================
+ Hits 10092 10176 +84
+ Misses 1882 1854 -28
- Partials 309 310 +1 |
6697c13
to
4c6b9b0
Compare
Failing tests are fixed in #1031 |
/rebase |
4c6b9b0
to
78bbb2f
Compare
When I open an old workflow I get
|
The issue should be solved in #1034. |
/rebase |
78bbb2f
to
68966f4
Compare
68966f4
to
3a6f9b7
Compare
@VesnaT, I fixed migrations, so I think it is ready for review. |
Issue
This PR is part of #963, which I am splitting into smaller pieces for easier review.
The primary motivation behind this is to make Preprocess work with language from Corpus.
Description of changes
This PR prepare a UDPIPE normalizer to communicate (get and return languages) as ISO codes, which is necessary to enable language from Corpus (languages are stored in Corpus in ISO format).
After I changed UDPIPE to work with ISO language codes, I also had to adapt the Preprocess Widget to store settings as ISO codes and call the Lemmagen filter with ISO language code.
This PR also slightly change the names of UDPIPE languages in the dropdown. The change is that the names of language variations (different models for the same language) are now written in parenthesis, and all words in the multi-word language are uppercase (to match ISO standard).
Udpipe will be implemented in separate PRs.
Includes