-
Notifications
You must be signed in to change notification settings - Fork 0
/
StatusReport
37 lines (23 loc) · 1.85 KB
/
StatusReport
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Understanding the factors that improve Business ratings on Yelp (Status Report)
Objective:
Model a system that provides suggestions for getting more positive reviews and to improving ratings - to get more 3/4/5 ratings - and of businesses that consistently get 1/2 ratings on yelp.
We focus on the businesses that get 70% 1 or 2 ratings and 30% 3 or 4 or 5 ratings and try to narrow down factors that improve ratings.
Tools and Environment setup
Most of the time spent has been in setting up tools required for analysing the data.
Due to large volumes of data involved, I decided to use amazon Ec2 cluster services to parse the data.
I have successfully set up Amazon Ec2 instances.
Also, to run tests on sample data, I installed some popular data mining tools (rapid miner, Gephi, Tableau Public 8.3) and trying to run aggregate functions on the required attributes.
I had to write python code to convert json format to csv for the data mining tools to interpret the data.
I am currently writing parsing code to parse complex attributes (nested attributes) in the json file.
I also performed data wrangling activities for some the data files.
yelp_academic_dataset_business
yelp_academic_dataset_checkin
yelp_academic_dataset_tip
yelp_academic_dataset_user
Next steps:
1) Parse reviews' data and run sentiment analysis.
2) Extract factors for a business to get poor(1,2) and good(3,4,5) ratings ( grouping by business type and area)
3) Analysing top negative and positive factors and its effect on ratings. ( example: how much negative affect does it have for the factor "not having wifi"
4) How much influence one factor has on other factors.( distance vs taste , distance vs prices, taste vs ambience )
5) Extract top factors that has a potential to increase good ratings atleast from 30% to 50%.
6) Create a website and place the top factors for each business type.