A Study on contributing factors of Yelp Review

Chaoran Liu

Introduction:

Utilising the Yelp review, this report is going to address:

  1. Differece of business between Europe and North America.
  2. The main driving factor of the high rating business.
  3. Is the rating more objective or more subjective.

Methods and Data

  1. Data:
    • Only the Yelp dataset business and user are used
  2. Methods:
    • Transformation: from json file to data.frame using jsonlite
    • Feature Engineering: created postcode and continent
    • Data Imputation: replace numeric numericwith median.
    • Model: Boosted linear model was chosen to evaluate the variable importance

Differece between Europe and N.America

Europe resturants (mean 3.79) are generally having higher rating than the ones in North America (mean 3.66)
Some of interesting differences are parking lot, smoking area and divey

plot of chunk unnamed-chunk-1

Driving factor of the high rating business

  • Europeans more care about parking lot, outdoor seating, noise level, TV, wheel chair accessible.
  • Americans more care about by appointment only, drive thru, good for kids, and take reservations.

plot of chunk unnamed-chunk-2

More objective or more subjective?

4 out of the top 5 contributors are yelp rater characteristics, which implies the rating is very likely to be biased by different raters.

plot of chunk unnamed-chunk-3