22 November 2015

Introduction

Using sentiment analysis, I hope to answer the following questions:

What prhases frequently associated with the positive ratings and netitive ratings(higher vs lower)

How key review phrases influence review ratings and there by service quality?

These questions may be of interest to customers to understand which frequently used to look for higher rated businesses when they opt for service. Other use case is business owners to identify business strengths and weaknesses.

Methods

For capstone project, the focus is on restaurants review data from Yelp Dataset. A sample of 1,000 restaurant reviews is taken, from which I build the corpus and term-document matrix:

convert text to lower case remove punctuation remove numbers remove white space I skip stemming and removal of sparse terms in order to consider all words from the reviews. I build and sort data frames of 4-Grams tokens before plotting bar plots and wordclouds for visualization and analysis. The data visualization and analysis of the most-frequently-used phrases (for reviews with different star-ratings) will be used to answer the questions.

Results

Figure 1: Number of ratings drop with the star-rating itself. Figure 2 and 3: 1- and 2-star reviews contain lots of negative words, food-related and service-related references. Figure 4, 5 and 6: 3-, 4- and 5-star reviews contain more postive descriptive words, but much fewer references to specifics of food or service.

All reviews: Among the top-words, there are no references to car parks, wi-fi, coat check, music, distance, convenience etc.

Graphs cab be found at Github https://github.com/sreekanth207/Data-Science/commit/ce9be2a874c24a1fea87435ecb172369dfce2842

Discussion and Conclusion

Most-frequently used phrases:

Common positive descriptive words: good, great, best and like. Common business-related nouns: food, place, service and experience. "I will not be", "will never go back", "the rest of the", "my husband and i", "the quality of the", "the food was good", "one of the best", "a great place to", "i love this place", "i have ever had"

Inferences:

A trend that customers with positive user experience are more likely to write reviews. Customers write most frequently about food quality (e.g. their favorite food etc) and service quality (e.g. waitress, orders, people), with no references to other attributes like car parks, wi-fi, music etc. Inference: Provision of outstanding food and/or customer service will have a higher probability of a review being written.

We can also infer that the phrases for 4-star and 5-star is a clear indication of the customer satisfaction and when a new customer looking for choosing a resturant he/she can look for those key phrases.