the text “Hello China" is detected to 'it' #83

gaowenxin95 · 2020-09-07T09:53:21Z

when l detect ”Hello China"
print(langid.classify(”Hello China"))
the result :
('it', -37.309250354766846)
@Paczesiowa @pquentin @martinth @jnothman @saffsd

pquentin · 2020-09-07T10:22:27Z

This can happen on short texts, try a longer one

gaowenxin95 · 2020-09-08T02:14:25Z

This can happen on short texts, try a longer one

thanks

l try a sentence contain five words still detect wrong
like this:"hello China you are great"

('it', -31.29085063934326)

when contain six word like this "hello China you are my sunshine"
its right

('en', -49.038776874542236)

another like this "hello China hello China hello China "
its wrong

('it', -27.979979038238525)

l would like to know how many words should l try at least in the sentence?
@pquentin @martinth @jnothman

KoenVanDuin · 2020-11-29T07:51:55Z

I am dealing with the same issue. In my case, inputting larger pieces of text is no problem, but I want to know what increase of text volume increases the reliability in which extent.
Moreover, does it have to be a real text, or is a bunch of words from the language also fine?
Lastly, I wonder what the returned negative coefficient says about the reliability of the translation. I couldn't find information about what this number actually means.

Many thanks in advance.

ffreemt · 2021-08-05T15:43:16Z

Try my fastlid: pip install fastlid

Fast and accurate, dependent on fasttext though (Windows systems without a C compiler can use fasttext*,whl available at https://www.lfd.uci.edu/~gohlke/pythonlibs/) .

fastlid also tries to imitate two of langid's functionalities.

yuviabhi · 2021-09-07T05:36:04Z

Having the same issue.
The text Our fifth module explains some key calculus skills is detected as 'no' though it have 8 words.
In another example, the text (with 4 words) Discover some angle relationships is detectesd as 'sw' but when I changed the text to Discover some angle relationships between them (with 6 words) then it is detected as 'en' as expected..
So what is the minumum word we need to detect?

everdrone · 2022-04-17T19:51:51Z

+1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the text “Hello China" is detected to 'it' #83

the text “Hello China" is detected to 'it' #83

gaowenxin95 commented Sep 7, 2020

pquentin commented Sep 7, 2020

gaowenxin95 commented Sep 8, 2020 •

edited

Loading

KoenVanDuin commented Nov 29, 2020

ffreemt commented Aug 5, 2021

yuviabhi commented Sep 7, 2021

everdrone commented Apr 17, 2022

the text “Hello China" is detected to 'it' #83

the text “Hello China" is detected to 'it' #83

Comments

gaowenxin95 commented Sep 7, 2020

pquentin commented Sep 7, 2020

gaowenxin95 commented Sep 8, 2020 • edited Loading

KoenVanDuin commented Nov 29, 2020

ffreemt commented Aug 5, 2021

yuviabhi commented Sep 7, 2021

everdrone commented Apr 17, 2022

gaowenxin95 commented Sep 8, 2020 •

edited

Loading