Prediction of Star rating of restaurant on Yelp

Sanjay Meena
21/11/2015

Introduction

This project is to answer following questions:

  • Predict the rating of a restaurant based on its attributes and other information.
  • Identify which various attributes of a restaurant has large affect on its ratings on yelp.

The underlying idea is to identify which attributes of a restaurant has most impact on its user ratings. This can help restaurant owners to improve their business.

Methods and Data

The data is from Yelp Dataset Challenge. It was converted to “RDS”“ Format from "Json” . The data was cleaned, aggregrated, analyzed

We used Random forest model to find out the most relevant features out of more than 150 features related to restaurants. We determined the variable importance using the variable importance plot and displayed prediction results using confusion matrix.

Results

alt-text-1

Conclusion/Discussion

We find that that about 30 attributes like review count, city, waiter service, good for kids etc. have the highest impact on the restaurant ratings.

If we are willing to relax the accuracy requirement, the created model may be a good rough predictor. It can correctly predict 91% of predictions between +/- 1 stars.