Yelp Sentiment Model

Matthew Landowski
November 2015

Completed in Accordance with the Coursera.org John Hopkins Data Science Specializaiton Capstone Course

The Question

Can a model be developed to predict sentiment from a review text?

yelp jhu coursera

Yelp Review - What's in the data?

  • The Yelp data contains 1.6 million reviews.
  • Each review had text attached to it and a star rating given by the reviewer.

Example text of a review

“Mad Max Pizza was amazing!!!”

stars count percentage of data
1 159811 10.18%
2 140608 8.96%
3 222719 14.19%
4 466599 29.73%
5 579527 36.93%

Summary of star ratings in the data set

Methodology - Feature Engineering

  • Each text of a review was broken down into pieces.
  • Unhelpful text elements were remove like, numbers, punctuation, and stopwords(the, is, it).

e.g. “Mad Max Pizza was amazing!!!” => [mad, max, pizza, amazing]

  • In addition a sentiment score was calulated for each review from the difference of postive words minus negative words.
  • A random 10% sample of the source Yelp data was taken due to computational limitations.
  • 3 star reviews were removed, the focus of this work was on postive vs negative sentiment.
  • Each observation in the generated data set represented a review; variables were features and review star rating.

Model - Logistic Regression

  • The data set was run through Amazon Web Services (AWS) - Machine Learning service.
  • AWS trains a logistic regression model on the data, results are in the tables below.
  • Good ratings are 5 and 4 stars, bad ratings are 2 and 1 stars.
Pred/Ref 5 4 2 1 Total F-score Accuracy %
5 14203 2612 118 313 17246 0.7087325 0.8235533
4 6666 6632 402 306 14006 0.5347525 0.4735114
2 894 1156 1234 958 4242 0.3790508 0.2258369
1 1071 398 515 2871 4855 0.6172203 0.1060762
Pred/Ref Good.Rating Bad.Rating Total F-score Accuracy %
Good.Rating 30016 1236 31252 0.9294894 0.9604505
Bad.Rating 3318 5779 9097 0.7173535 0.6352644

Summary

Good and bad ratings can be predicted with good accuracy.

  • Good rating (5 and 4 stars) had an accuracy of 96%.
  • Bad ratings (2 and 1 stars) had an accuracy of 64%.
  • When the ratings were broken down further 5, 4, 2, and 1 the accuracy decreased overall. Good ratings made up a majority of the reviews which might explain the imbalance.

Future Work

  • Addition of neutral rating (3 stars) into the model.
  • Reserach and address imbalance in star ratings.
  • Use of contextual sentiment features.