Introduction

The success of a restaurant is tied to, among other things, its online ratings and location. The search for a new restaurant to try-out invariably starts with doing an online search and then reviewing its ratings and location.

The purpose of the project is to determine if and what cultural and locational factors play a part in restaurant ratings and reviews. Additionally we seek to associate this with Health Inspection ratings issued by the City and determine any correlations with user reviews mentioning hygiene as a factor.

Data Sources

  1. Primary Source : Yelp Data Set - https://www.yelp.com/dataset_challenge

  2. New York City Inspection Ratings : http://a816-restaurantinspection.nyc.gov/RestaurantInspection/SearchBrowse.do

  3. Secondary sources (desirable) : Foursquare, Zomato, Tripadvisor, Zagat, Timeout

Yelp NYCGrades FS_Z

Analysis

Cultural factors

We seek to ask questions such as, but not limited to -

  1. What cuisines do Yelpers rave about in various cities? Does each location have a preference?

  2. In which cities are Yelpers more particular about service quality?

  3. In cities outside the United States, does culture have an impact on the review?

  4. Do factors like type of cuisine play in part in restaurant closure?

Locational factors

We seek to ask questions such as, but not limited to -

  1. How much of a business’ success is really just location, location, location?

  2. Are restaurants with positive reviews and good health inspection standing surrounded by restaurants with similar cuisines?

  3. Do restaurants at a specific address or street or area close down more frequently?

Inspection Ratings and Other

  1. Does a relationship exist between restaurants’ below-par health inspection rating and user reviews which rate the hygiene poorly?

  2. If there is a disconnect between reviews and health inspection ratings, is the date of the inspection a factor?

Tools

  1. R standard packages on Data Acquisition and Charts

  2. Cloudera or AWS with MySQL as data repository

  3. R packages on Sentiment Analysis and NLP

Desirable

  1. We may look into, time permitting, a ranking system based on various cultural parameters which are important for a user. And for this we may seek to aggregate the data from various sources to turn into an Aggregator based algorithm.

  2. Impact of Twitter and Newspaper reviews on Ratings