Shawn Estes
11/1/2015
Star rating reviews are inherently flawed. Yelp suffers from the same problems as movie critics that review Citizen Kane and Ghostbusters with 5 stars.
When a Yelper looks to Yelp for ratings, what is useful data?
Can we realign the star ratings based on how positive (or negative) a review is?
Users would expect that an average review be 3.5 stars. But that's not the case at all.
Using the Yelp Academic Dataset, I preprocessed the data by reducing down to just the review text, removed punctuation and changed to lowercase, and removed stop words from a public corpus. After that I used the Porter Algorithm to reduce the remaining words to their roots.
Naive Bayes was the primary model
*70% sounds much worse than it is. Humans only agree on 82% sentiment.