Georgia Galanopoulos & Jaideep Mehta
May 18, 2017
Relationship between HIData and Restaurant Yelp reviews
Potentiall incorporation of reviews into determining Health Inspection Ratings
Job performed more efficiently by Health Inspectors with the aid of the crowd
Created a hygiene-related lexicon to determine if reviews included words pertaining to the state of cleanliness in restaurants
The closer to zero the score was, the least likely the restaurant had health issues
User stars are more accurately reflected than business stars
Relation exists. As Grade decreases, so does Mean Score.
Motivation : Yelp DataSet challenge question
Use dplyr to create City-Cuisine Grid
Phoenix - more diverse. Unsurprisingly the Mexican count is much higher there due to its high Latino influence.
Charolette has a number of reviews for Southern cuisine. Surprsingly its Cafe culture is yet to develop here.
Toronto comes out to be a signifcantly more diverse city - a world city like New York.
Montreal's absence of anything but french and traditional food is very stark.
What stands out for Scotland is Gastropubs and for Stuttgart is Beer Gardens.
Edinburgh seems to have a large presence of Indian cuisine which is in line with Curry influence in UK.
The Vegan and Vegetarian culture - mature in Montreal, Toronto and Phoenix, not in Edinburgh and Stuttgart.
There seems no place for Juice bars in Stuttgart. Is it because of strong beer culture?
Used Summarization using Leaflet
While the Strip expectedly has highest number of high quality restaurants (at 268), it came as a surprise that Chinatown (with 166) and Downtown (with 157) has very large number of such places too.
Sentiment Analysis : Yelp Reviews reflect the health inspection scores as issued by the city of Las Vegas.
Cultural Trends : Yelp Reviews are a rich source of consumer preferences which can be mined to spot cultural trends as seen in our visualizations.
Large number of records, esp. Reviews (4 million)
Slow performance of RODBC
No hygiene-specific lexicon, had to hand curated
Combining datasets to match Restaurants