-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big dataset of unsolicited reviews found #98
Comments
@Mahmoud-s-programs @Sepideh-Ahmadian |
dataset link [https://github.com/mohamedadaly/LABR/tree/master/data] filename: reviews.tsv code:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have written a Python script that filters out explicit terms and stores the data in a new file with the same format. Consequently, the number of reviews dropped from 63000 to 11000. I checked the results and surely enough the remaining reviews do not contain any of the explicit terms that were specified in the script including those that have suffixes and prefixes.
My issue is that some of the terms are not aspects as noted by Dr. Fani. Here is the list of terms in English that resulted in a deletion of review:
كتاب - book
كتب - books
مؤلف - author (male)
كاتب - writer (male)
مؤلفة - author (female)
كاتبة - writer (female)
رواية - novel
روايات - novels
قصة - story
قصص - stories
حكاية - tale
حكايات - tales
مجلد - volume
جزء - part or section
فصول - chapters
فصل - chapter
شخصية - character
بطل - hero or protagonist (male)
بطلة - heroine or protagonist (female)
أبطال - heroes or protagonists
عدو - enemy or antagonist
أعداء - enemies or antagonists
صديق - friend (male)
صديقة - friend (female)
أصدقاء - friends
حبكة - plot
حدث - event
أحداث - events
نهاية - ending or conclusion
بداية - beginning or start
ذروة - climax
حل - resolution or solution
عقدة - conflict or knot (in a story)
أسلوب - style
لغة - language
تعبير - expression
وصف - description
سرد - narration
حوار - dialogue
كلمة - word
جملة - sentence
مفردات - vocabulary
مصطلحات - terminology or terms
As you can see, some of the terms may be implicit or aspects that would affect the review.
The text was updated successfully, but these errors were encountered: