Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does liwkalike() handle proper regular expressions? #31

Open
cjvanlissa opened this issue Jun 4, 2020 · 3 comments
Open

Does liwkalike() handle proper regular expressions? #31

cjvanlissa opened this issue Jun 4, 2020 · 3 comments

Comments

@cjvanlissa
Copy link

cjvanlissa commented Jun 4, 2020

Dear Dr. Benoit,

I tried to run the following:

txt <- c("The red-shirted lawyer gave her yellow-haired, red nose ex-boyfriend $300
            out of pity:(.")
dict <- quanteda::dictionary(list(lawyer = c("\\blawyer\\b", "law.er")))
liwcalike(txt, dict, what = "word", valuetype = "regex")

But the word lawyer is not matched:

docname Segment WPS WC Sixltr Dic lawyer AllPunc Period Comma Colon SemiC QMark Exclam Dash Quote
1   text1       1  24 24   8.33   0      0   29.17   4.17  4.17  4.17     0     0      0 12.5     0
  Apostro Parenth OtherP
1       0       0   12.5`

Is this expected behavior? To what extent are regular expressions supported by liwkalike() and, downstream, tokens_lookup.tokens()?

Thank you sincerely,
Caspar

@kbenoit
Copy link
Owner

kbenoit commented Jun 5, 2020

Currently, liwcalike() only takes "glob" dictionary patterns, but it would be a reasonable feature request to add valuetype to the function.

To get the equivalent patterns, you would use:

library("quanteda.dictionaries")

txt <- c("The red-shirted lawyer gave her yellow-haired, 
          red nose ex-boyfriend $300 out of pity:(.")
dict <- quanteda::dictionary(list(lawyer = c("lawyer", "law?er")))
liwcalike(txt, dict)
##   docname Segment WPS WC Sixltr  Dic lawyer AllPunc Period Comma Colon SemiC
## 1   text1       1  24 24   8.33 4.17   4.17   29.17   4.17  4.17  4.17     0
##   QMark Exclam Dash Quote Apostro Parenth OtherP
## 1     0      0 12.5     0       0       0   12.5

@cjvanlissa
Copy link
Author

Thank you for clarifying! I have a dictionary that makes extensive use of perl regex, so indeed, I would like to put my name down for this feature request :)

Sincerely,
Caspar

@kbenoit
Copy link
Owner

kbenoit commented Jun 6, 2020

Noted! This will not be hard to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants