Data Science Capstone: Analysis of Yelp data

Sandeep Patil
Nov 2015

Introduction

I had two primary questions to answer:

1.Check if people with verbose reviews tend to leave more positive or negative reviews.

2.Check if frequent reviewers tend to leave more positive or negative reviews.

Methods and Data

The data was downloaded , read and parsed into R dataframes from original JSON format
Basic summary stats were calculated to understand the data
Exploratory analysis was performed initially to observe patterns
Generalized linear model was run later to find the strength of relationships.

Results

There was very poor correlation observed between the verbosity of the review and the rating values. In fact no relationship could be made out between these two variables.
No correlation was observed between the no of reviews given by a reviewer and the star rating.

None of the sub categories show significant correlations for the above relationships. This is also confirmed by the exploratory data analysis performed with a trellis graph and visual observations as well as the linear model. I believe that this insight (lack of a good fit) is in itself useful.

Discussion

The idea behind finding these patterns was to see if people that are more verbose and/ or frequent reviewers have biases as far as the applied ratings go. These biases were assumed to be not intentional , just that certain factors (bad experience, writing a review for the sake of it etc.) might influence the quality of reviews. At least looking at the data and slicing in by different categories gives the impression that it may not be the case and the data is fairly well distributed. This is a good observation in by itself as it rules out certain biases, on the other hand if certain patterns between these dependent variables can be observed it can be used to categorize the quality of reviews internally. This is important because in all modern companies filtering out bad and non useful reviews is a very important part of the process and integral to running the companies (Amazon, Yelp, Uber) themselves.