-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem of disambiguation of ENs according to the case or spelling of terms #132
Comments
Thanks for the issue @aa303554 ! Currently each spelling variant relies on its own statistics as anchor in the French Wikidata, which explains these different behaviours. In case we don't have enough context for a spelling variant, the term will never been linked to the Wikidata entity - in the current state of entity-fishing. To improve this, my idea is to use better smoothing and priors for variants and for Wikidata labels (currently not used), #72. Please note: the example you are using is too short for the normal text disambiguation field (normal unit for text field is more a paragraph), you need to use the short text input. It might not solve the spelling error/variants problem in general, but it will work better. I think however it solves your first example: |
In French entity-fishing has difficulty recognising Ireland by case and spelling. "Irlande" is the correct spelling "Ireland" is the English spelling and the others are "Irlande" written with spelling mistakes. It is not consistent according to the case certain spellings are not recognized the same, if there is a capital letter or not.
With other countries there is no such problem (I tested with Japan).
correctly recognize : irlande, irland (incorrect), irelande(incorrect), Irelande(incorrect)
only type ner LOCATION: Irlande, Ireland(incorrect)
not recognize : Irland, ireland
See below for example.
The text was updated successfully, but these errors were encountered: