Yelp Star Rating System Reviewed: Are Star Ratings inline with textual reviews?

Eduardo Magalhaes Barbosa
18/11/2015

2 - Introduction

Star classification features are ubiquitous in apps world, but how good are people star rating business at Yelp, compared with user textual reviews? Are do they sending the same message? Using Yelp data and NLP Sentiment Analysis this paper compares the rating of stars given for each Review with textual scores from Sentiment Analysis.

The underpining question is differences between score means and star ratings are statistically significant?

AOV compare means of Sentiment Analysis scores and Stars:

  • Null Hypothesis (all five stars means are equal)
    • H0: 1S = 2S = 3S = 4s = 5s
  • Alternative Hypothesis (not all star score means are equal)
    • H1: not all S are equal

3 - Methods

Using PNL Sentiment Analysis dataset from (Hu and Liu, KDD-2004) to provides Negative and Positive words to score Yelp review texts. Different means showed below: plot of chunk unnamed-chunk-1

Variation/spread in Scores, overlaps between stars. Differences in means could have come about by chance. ANOVA test was used to answer this question. plot of chunk unnamed-chunk-2

4 - Results

ANOVA showed consistent F values and p-values low. In plain words, the variation of score means among different stars (numerator) is much larger than the variation of score means within each stars, and p-value is less than 0.05 (as suggested by normal scientific standard). Hence, given the confidence interval, the alternative hypothesis H1 was accepted and that there was a significant relationship between Stars and Score means. Tukey post-hoc test confirmed it valid for all pair-wise Stars.

Taking the results of TukHSD function (stands for Tukey Honest Significant Differences), 'diff' columns shows the difference in score means for each pairwise group of Stars. The 'p adj' confirms their significance. Thus, this study concludes that:

There was a significant difference in Sentiment Analysis Scores between Stars (p= 0.00) for all pair-wise of stars studied.

5 - Conclusion

plot of chunk unnamed-chunk-3 The alignment between Sentiment Analysis Review Scores and Stars were demonstrated in a way that Stars given are in line with textual reviews users made.

The implications should not be disregarded by business owners, as reviews are replicated thru social networks and classification algorithms used by applications such Yelp. Better-rated business are positioned atop the listing generating more business, more revenue.

Notice, this is the mean of all reviews by Star factor, and future studies could aim to do the same test using demographic variables from business and users.