Sanjay Meena
21/11/2015
This project is to answer following questions:
The underlying idea is to identify which attributes of a restaurant has most impact on its user ratings. This can help restaurant owners to improve their business.
The data is from Yelp Dataset Challenge. It was converted to “RDS”“ Format from "Json” . The data was cleaned, aggregrated, analyzed
We used Random forest model to find out the most relevant features out of more than 150 features related to restaurants. We determined the variable importance using the variable importance plot and displayed prediction results using confusion matrix.
We find that that about 30 attributes like review count, city, waiter service, good for kids etc. have the highest impact on the restaurant ratings.
If we are willing to relax the accuracy requirement, the created model may be a good rough predictor. It can correctly predict 91% of predictions between +/- 1 stars.