Capstone Project for Data Science Course

Paul Lim Min Chim
20 Nov 2015

Sentiment Analysis of Yelp Restaurant Reviews

Introduction Sentiment analysis is the computational study of opinions, sentiments and emotions expressed in text. It has many applications and is useful for social media monitoring, tracking of business reviews and business analytics.

Problem Statement Using sentiment analysis, I hope to answer the following with regards to the Yelp Dataset:

  • What are the most-frequently used phrases in reviews?
  • What motivation can we infer for customers to write reviews, from the most-frequently used phrases in reviews?

Who will find this interesting? This may be of interest to business owners who want to identify their business strengths and weaknesses based on customer reviews. Such analysis can also be used to predict how key review phrases may influence the review rating, and ultimately impact the business.

Methods

For capstone project, the focus is on restaurants review data from Yelp Dataset. A sample of 1,000 restaurant reviews is taken, from which I build the corpus and term-document matrix:

  • convert text to lower case
  • remove punctuation
  • remove numbers
  • remove white space

I skip stemming and removal of sparse terms in order to consider all words from the reviews. I build and sort data frames of 4-Grams tokens before plotting bar plots and wordclouds for visualization and analysis. The data visualization and analysis of the most-frequently-used phrases (for reviews with different star-ratings) will be used to answer the questions.

Exploratory Analysis & Results

  • Figure 1: Number of ratings drop with the star-rating itself.
  • Figure 2 and 3: 1- and 2-star reviews contain lots of negative words, food-related and service-related references.
  • Figure 4, 5 and 6: 3-, 4- and 5-star reviews contain more postive descriptive words, but much fewer references to specifics of food or service.
  • All reviews: Among the top-words, there are no references to car parks, wi-fi, coat check, music, distance, convenience etc.

The full report and plots can be viewed here.

Figure 1:
Figure 1
Figure 2:
Figure 2
Figure 3:
Figure 3
Figure 4:
Figure 4
Figure 3:
Figure 3
Figure 4:
Figure 4

Discussion & Conclusion

Most-frequently used phrases:

  • Common positive descriptive words: good, great, best and like.
  • Common business-related nouns: food, place, service and experience.
  • “I will not be”, “will never go back”, “the rest of the”, “my husband and i”, “the quality of the”, “the food was good”, “one of the best”, “a great place to”, “i love this place”, “i have ever had”

Motivation for writing reviews:

  • A trend that customers with positive user experience are more likely to write reviews, while the opposite may not necessarily bother to write about it.
  • Customers write most frequently about food quality (e.g. their favorite food etc) and service quality (e.g. waitress, orders, people), with no references to other attributes like car parks, wi-fi, music etc. Inference: Provision of outstanding food and/or customer service will have a higher probability of a review being written.