Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the text “Hello China" is detected to 'it' #83

Open
gaowenxin95 opened this issue Sep 7, 2020 · 6 comments
Open

the text “Hello China" is detected to 'it' #83

gaowenxin95 opened this issue Sep 7, 2020 · 6 comments

Comments

@gaowenxin95
Copy link

when l detect ”Hello China"
print(langid.classify(”Hello China"))
the result :
('it', -37.309250354766846)
@Paczesiowa @pquentin @martinth @jnothman @saffsd

@pquentin
Copy link
Contributor

pquentin commented Sep 7, 2020

This can happen on short texts, try a longer one

@gaowenxin95
Copy link
Author

gaowenxin95 commented Sep 8, 2020

This can happen on short texts, try a longer one

thanks

l try a sentence contain five words still detect wrong
like this:"hello China you are great"

('it', -31.29085063934326)

when contain six word like this "hello China you are my sunshine"
its right

('en', -49.038776874542236)

another like this "hello China hello China hello China "
its wrong

('it', -27.979979038238525)

l would like to know how many words should l try at least in the sentence?
@pquentin @martinth @jnothman

@KoenVanDuin
Copy link

I am dealing with the same issue. In my case, inputting larger pieces of text is no problem, but I want to know what increase of text volume increases the reliability in which extent.
Moreover, does it have to be a real text, or is a bunch of words from the language also fine?
Lastly, I wonder what the returned negative coefficient says about the reliability of the translation. I couldn't find information about what this number actually means.

Many thanks in advance.

@ffreemt
Copy link

ffreemt commented Aug 5, 2021

Try my fastlid: pip install fastlid

Fast and accurate, dependent on fasttext though (Windows systems without a C compiler can use fasttext*,whl available at https://www.lfd.uci.edu/~gohlke/pythonlibs/) .

fastlid also tries to imitate two of langid's functionalities.

@yuviabhi
Copy link

yuviabhi commented Sep 7, 2021

Having the same issue.
The text Our fifth module explains some key calculus skills is detected as 'no' though it have 8 words.
In another example, the text (with 4 words) Discover some angle relationships is detectesd as 'sw' but when I changed the text to Discover some angle relationships between them (with 6 words) then it is detected as 'en' as expected..
So what is the minumum word we need to detect?

@everdrone
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants