Do More Experienced Yelpers Write Better Reviews?

Sumeet Shah
November 21, 2015

Introduction

In this analysis, my goal is to see if more experienced Yelp users write better reviews. If so, can the quality of a user's reviews be predicted based on their experience?

If more experienced users do indeed create better reviews, then Yelp can take actions to incentivise new users to continue writing and stay on the site as well as provide incentives for their more experienced users to keep using Yelp and producing high quality reviews. If this is not the case, Yelp can shift its focus to simply growing its user base in the hope of attracting those who are already good review writers.

Methods and Data

  • My analysis uses the Review Data, User Data, and Business Data sets provided by Yelp.
  • The quality of each review is calculated by normalizing review quality to 1 for each business and averaging the three facets of funny, cool, and useful. Then, a user is given an overall score which is the mean of all of their review scores.
  • I checked for correlation between the number of reviews each user has written and the number of weeks that they have been using Yelp against their overall score.
  • Using the Review Count, Weeks on Yelp, and Overall Score, I constructed a predictive model using the Random Forest Method.

Results

Correlations

Overall_Score Funny_Score Cool_Score Useful_Score Review_Count Time_On_Yelp
Time on Yelp (weeks) 0.1259446 0.0775225 0.0955114 0.0785376 0.3202479 1.0000000
Review Count 0.1383375 0.0970268 0.1199511 0.0628900 1.0000000 0.3202479

Random Forest Models

  • Unfiltered Data Accuracy: 0.7146
  • Unfiltered Data No Information Rate: 0.725
  • Filtered Data Accuracy: 0.6620
  • Filtered Data No Information Rate: 0.6676

Discussion

  • The previous slide shows the correlations for the unfiltered data, which show that there is so significant correlation between Review Count/Weeks on Yelp and the quality of a user's reviews. The filtered data had even weaker correlations.
  • The results of the random forests show that the model for the unfiltered data was very bad, and was effectively guessing based on probabilities from the training data. The model for the filtered data was slightly better, but is still extremely bad.
  • Do more experienced Yelpers write better reviews? The evidence from this analysis suggests probably not. If they do, then more than just Review Count and Weeks on Yelp is needed to prove it.