Objective

This project is driven by Yelp Academic Data Set, which is offered via Yelp Data Challenge. Yelp is one of the leading online recommendation sites for Restaurants and other local businesses, and it is very popular in North America and Europe.

Our objective with the project is two-fold-

  1. Sentiment Analysis: Analysing if Yelp Reviews for a Restaurants correlate with Health Inspection Grades issued by the city.

  2. Exploratory analysis: Explore the Yelp Dataset to spot
  1. Locational concentrations of high quality restaurants
  2. Variation in Cultural Trends and Preferences across Major Cities

Approach

The method used in this project in order to answer the primary questions was an OSEMN workflow approach. In this approach, there are five distinct stages: Obtain Data, Scrub Data, Explore, Model and iNterpret.

Introduction to the Data

The data set comprises of following files -

YelpFiles

YelpFiles

It is a large dataset with details of about 4 million reviews, 1 million users and 144 thousand businesses.

YelpNumRecords

YelpNumRecords

The schema of the json files is as follows -

YelpDataSchema

YelpDataSchema

More details about the challenge, dataset an its schema can be seen at the Yelp Data Challenge website.

Maintaining the Data on Remote Server

The data was maintained on SQL Server in Microsoft Azure Cloud. The data upload was an extensive exercise which took multiple days due to sheer volume of records. The R packages used to help in this were RODBC and jsonlite. RODBC was chosen because RSQLServer has been deprecated by CRAN-R and is no longer actively supported.

Example of how connection was established -

library(RODBC)
connectionString <- "Driver=ODBC Driver 13 for SQL Server;Server=tcp:data607yelp1.database.windows.net,1433;Database=DATA607_FinalProject_Yelp1;Uid=data607@data607yelp1;Pwd=XXXXX;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;"
#conn <- odbcDriverConnect(connectionString)

The tables uploaded were, which correspnd to yelp files mentioned above -

YelpBusiness

YelpUser

YelpReview

Once the data was uploaded multiple tables were created on server side as well to help the analysis -

YelpLVBusiness - Restaurants in Las Vegas

YelpLVReview - Reviews for Restaurants in Las Vegas

YelpLVReviewTokens - Tokens from Sample Las Vegas Reviews to be used in Sentiment Analysis

YelpRestaurants - All Yelp Restaurants

YelpRestaurantCategory - Restaurants and their Category (~Cuisine) table

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidytext)
library(stringr)
library(ggplot2)
library(RODBC)
library(ggrepel)
library(knitr)
library(tidyr)
library(jsonlite)
library(leaflet)
library(dplyr)

Analysing the Health Inspection

Obtain Data The first step was to retrieve data regarding health inspection grades, restaurant/food/drink businesses and reviews/ratings of said businesses. The two sources were the Yelp Dataset Challenge data (can be found here: https://www.yelp.com/dataset_challenge) and, under the US City Open Data Census website, the Las Vegas Health Inspection Data, whose location was chosen for its larger data pool in Yelp reviews, (can be found here: http://us-city.census.okfn.org/entry/vegas/food-safety). The Health Inspection data was in a csv format, while the Yelp Dataset came in the form of json files (User, Business and Reviews), all of which were entered into an SQL database.

Scrub Data The primary goal when scrubbing was to first combine multiple datasets while retaining as much information as possible, to ensure that there were no duplicate entries, that entries were formatted correctly, that missing data was removed (to the capacity that it was allowed without making the data pool too small) and that headings and data entries were consistant for future analysis.

Explore Even though there were questions created before combining the datasets, many of our approaches and goals had to be readjusted when finally working with the combined data. After getting a sense of what kind of information we would be working with and the value of each variable in regard to our overall project, we were allowed to shift and narrow down our focus to determining the relationship between Yelp Reviews/Ratings (with an importance on hygiene) and Health Inspection Grades, in addition to analyzing what cuisines were more prominent across major cities within the U.S. and outside it.

Model In order to analyse the Yelp Reviews, a sentiment analysis was run, using a Health Lexicon to determine if the users’ opinion regarding the health status of the establishment matched that of the Health Inspectors. In addition, a regression model was used to determine the importance of the variables against the establishments’ overall rating in order to determine if any factors contributed more to a higher rating. Visual correlations were also made to enhance the conclusion formed by the regression.

Interpret The conclusions made were based on the models and visual analyses created during the “Model” portion of the data science workflow.

It should be noted that, due to the nature of the project, in addition to a vast number of unforseen occurences when handling the data, this workflow structure was not followed as rigidly as stated above. In times when a portion of a dataset that was deemed unecessary (and thus discarded earlier in the project) proved to be important, this workflow was used (within already another step of the primary analysis method timeline) until all necessary data was available for handling.

Yelp Data

###############  YELP   
connectionString = "driver=ODBC Driver 13 for SQL Server;Server=tcp:data607yelp1.database.windows.net,1433;Database=DATA607_FinalProject_Yelp1;Uid=data607@data607yelp1;Pwd=Ozymandias123;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;"
conn = odbcDriverConnect(connectionString)

# YelpUser
User = sqlQuery(conn, "SELECT * FROM YelpUser;")
# YelpReview
Review = sqlQuery(conn, "SELECT * FROM YelpLVReview;")
# YelpBusiness
LasVBus = sqlQuery(conn, "SELECT * FROM YelpLVBusiness;")

User = User[ c(1,3)]
Review = Review[ c(1:6)]
LasVBus = LasVBus[ c(1:11)]

kable(head(LasVBus))
business_id name neighborhood address city state postal_code latitude longitude stars review_count
3rkxTx8DoZSl7_FryhXCVQ Jugoo Chinese Restaurant University 4747 S Maryland Pkwy Las Vegas NV 89119 36.10413 -115.1377 3.5 34
xkVMIk_Vqh17f48ZQ_6b0w Tom Colicchio’s Craftsteak The Strip 3799 Las Vegas Blvd S Las Vegas NV 89109 36.10292 -115.1698 4.0 1915
n8XlV8E3BUBBF1gCe6mvEg House of Vino Centennial 8053 N Durango Dr, Ste 160 Las Vegas NV 89131 36.30763 -115.2786 3.5 34
2BbFeotL85cIaBjSq1SWiA Springs Cafe by Wolfgang Puck Westside 333 S Valley View Blvd Las Vegas NV 89107 36.16810 -115.1922 2.5 23
w_A1qFeQ3vOl7o4kdBxPXA Muse Asian Cuisine & Lounge Chinatown 5115 W Spring Mountain Rd, Ste 107 Las Vegas NV 89146 36.12604 -115.2114 3.5 14
XlUuMGMvkzy_QLAUTEe-fw Dumpling King Chinatown 5740 Spring Mountain Rd Las Vegas NV 89146 36.12608 -115.2198 3.5 120

Health Inspection Data

###############  INSPECTIONS   ###############  
LVInspection = read.csv("https://raw.githubusercontent.com/Galanopoulog/Final-Project/master/LVInspections.csv", header = T, stringsAsFactors = T)

levels(LVInspection$Inspection.Grade)[2] = "A"
levels(LVInspection$Inspection.Grade)[3] = "B"
levels(LVInspection$Current.Grade)[2] = "A"

kable(head(LVInspection))
Serial.Number Permit.Number Restaurant.Name Location.Name Category.Name Address City State Zip Current.Demerits Current.Grade Date.Current Inspection.Date Inspection.Time Employee.ID Inspection.Type Inspection.Demerits Inspection.Grade Permit.Status Inspection.Result Violations Record.Updated Location.1
DA0001131 PR0000411 Black Mountain Clubhouse Restaurant Black Mountain Clubhouse Restaurant 500 Greenway Rd Henderson Nevada 89015-7464 3 A 10/29/2015 0:00 3/30/2010 0:00 3/30/2010 11:30 EE7000392 Routine Inspection 18 B ‘B’ Downgrade 49122731112 2/21/2013 22:26 (36.019531, -114.9739838)
DA0001132 PR0000411 Black Mountain Clubhouse Restaurant Black Mountain Clubhouse Restaurant 500 Greenway Rd Henderson Nevada 89015-7464 3 A 10/29/2015 0:00 4/13/2010 0:00 4/13/2010 11:15 EE7000392 Routine Inspection 1 A Compliant 37 2/21/2013 22:26 (36.019531, -114.9739838)
DA0001133 PR0000411 Black Mountain Clubhouse Restaurant Black Mountain Clubhouse Restaurant 500 Greenway Rd Henderson Nevada 89015-7464 3 A 10/29/2015 0:00 3/10/2011 0:00 3/10/2011 10:40 EE7000392 Routine Inspection 9 A Compliant NA 2/21/2013 22:26 (36.019531, -114.9739838)
DA0001173 PR0000412 Black Mountain Clubhouse Bar Black Mountain Clubhouse Bar / Tavern 500 Greenway Rd Henderson Nevada 89015-7464 0 A 10/29/2015 0:00 3/30/2010 0:00 3/30/2010 12:45 EE7000392 Routine Inspection 3 A Compliant 313637 2/21/2013 22:26 (36.019531, -114.9739838)
DA0001174 PR0000412 Black Mountain Clubhouse Bar Black Mountain Clubhouse Bar / Tavern 500 Greenway Rd Henderson Nevada 89015-7464 0 A 10/29/2015 0:00 3/10/2011 0:00 3/10/2011 11:40 EE7000392 Routine Inspection 4 A Compliant NA 2/21/2013 22:26 (36.019531, -114.9739838)
DA0001185 PR0000414 Black Mountain Country Club - Golf Cart #1 Black Mountain Clubhouse Self-Service Food Truck 500 Greenway Rd Henderson Nevada 89015-7464 0 A 10/29/2015 0:00 3/30/2010 0:00 3/30/2010 12:40 EE7000392 Routine Inspection 0 A Compliant NA 2/21/2013 22:26 (36.019531, -114.9739838)

Combining Health Inspection data with LasVegas Business data

names(LasVBus)[2]= "Restaurant.Name"
names(LasVBus)[4]= "Address"

nondupl = LVInspection[!duplicated(LVInspection[,c('Restaurant.Name','Address')]),]
nondupl= nondupl[ c(3,6,10,11:13,17:22)]

Combined = merge(LasVBus, nondupl, by = c('Restaurant.Name','Address'))
Combined = Combined[,c(3,1,2,4,5,6,7,8,9,10,11,13,17,12,16,14,15,21,18:20)]
kable(head(Combined))
business_id Restaurant.Name Address neighborhood city state postal_code latitude longitude stars review_count Current.Grade Inspection.Grade Current.Demerits Inspection.Demerits Date.Current Inspection.Date Record.Updated Permit.Status Inspection.Result Violations
0IbboCQAEQlROAv5PKKO_Q Aloha Island Grill 2267 N Rampart Blvd Summerlin Las Vegas NV 89128 36.20120 -115.2811 4.5 15 A A 10 10 6/16/2010 0:00 6/16/2010 0:00 2/21/2013 22:26 Compliant 1.418223e+09
JgkfjCL1NfMBC_M_hax_3Q Aloha Kitchen 4466 E Charleston Blvd Sunrise Las Vegas NV 89104 36.15928 -115.0782 4.0 116 A A 6 10 1/20/2016 0:00 2/22/2010 0:00 2/21/2013 22:26 Compliant 1.420000e+13
21sGRVR7HEs_t6PdB9tGMw Angelina’s Pizzeria 5750 S Decatur Blvd Las Vegas NV 89118 36.08530 -115.2074 2.5 131 A A 8 8 12/3/2015 0:00 2/12/2010 0:00 2/21/2013 22:26 Compliant 1.428314e+09
Oyo5CWthu2upxxaf69MfIg Anna’s Garden 5740 Spring Mountain Rd Chinatown Las Vegas NV 89146 36.12602 -115.2198 3.5 29 A C 10 32 10/10/2011 0:00 2/2/2010 0:00 2/21/2013 22:26 ‘C’ Downgrade 1.210000e+23
Rhz38lsptoxX9IWS641LFA Annie’s Kitchen 1212 D St Las Vegas NV 89106 36.18430 -115.1470 4.0 10 A A 0 5 11/3/2015 0:00 5/3/2010 0:00 2/21/2013 22:26 Compliant 7.353637e+06
7NvIgf7XW55OKa5ZU7gw8Q Arandas Taqueria 2040 N Nellis Blvd Sunrise Las Vegas NV 89115 36.19695 -115.0617 4.0 20 A X 0 55 1/6/2015 0:00 3/10/2010 0:00 2/21/2013 22:26 Closed with Fees 4.710000e+51

Combining User data with Review data based on the Business data

UserReview = merge(User, Review, by = c('user_id'))
UserbyBusiness = semi_join(UserReview, Combined, by = "business_id")

Analyzing the Health Inspection data

# looking at grades and their demerit points
# Histograms
par(mfrow = c(1,2))
ggplot(LVInspection, aes(x=Inspection.Grade, y=Inspection.Demerits)) + geom_bar(stat="identity") + 
  labs(x="Inspection.Grade", y="Inspection.Demerits")

ggplot(LVInspection, aes(x=Current.Grade, y=Current.Demerits)) + geom_bar(stat="identity") + 
  labs(x="Current.Grade", y="Current.Demerits")

# Boxplots
par(mfrow = c(1,2))
boxplot(Inspection.Demerits ~ Inspection.Grade,data=LVInspection, main="Inspection Demerit Points by Grade",
        xlab="Inspection Grade", ylab="Inspection Demerits", ylim = c(0,120))

boxplot(Current.Demerits ~ Current.Grade,data=LVInspection, main="Demerit Points by Grade",
        xlab="Current Grade", ylab="Current Demerits",ylim = c(0,120))

We notice that during the Inspection, the majority of the business were classified as “A” grade, though it might have been expected to see a normal distribution in which, perhaps a “B” grade was more common. However, after the inspection, there was a dramatic shift in the grades in that the majority of businesses were upgraded to “A” grade, which may imply that after their grading, the businesses focused to reslove the issues that caused them demerit points. This seems more evident with the boxplots, where the upper outliers for demerits in each grades have radically decreased.

Adressing Questions

1. Relationship exists between Yelp ratings and Inspection Ratings

pairs(Combined[,c("stars","review_count","Current.Grade","Current.Demerits","Inspection.Grade","Inspection.Demerits")])

We see here that there doesn’t visually appear to exist a correlation between these variables. Nonetheless, we run a multiple regression.

# Regression
regstar = lm(stars ~ review_count + Current.Grade + Current.Demerits + Inspection.Grade + Inspection.Demerits, data = Combined)

summary(regstar)
## 
## Call:
## lm(formula = stars ~ review_count + Current.Grade + Current.Demerits + 
##     Inspection.Grade + Inspection.Demerits, data = Combined)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89014 -0.39232  0.05882  0.51465  1.16622 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.4106715  0.7067370   4.826 3.38e-06 ***
## review_count         0.0007989  0.0003249   2.459   0.0151 *  
## Current.GradeB      -0.0262220  0.4404514  -0.060   0.9526    
## Current.GradeC       0.3585021  0.7468499   0.480   0.6319    
## Current.GradeO      -1.4592595  0.6681818  -2.184   0.0305 *  
## Current.GradeX      -1.5123761  0.9769126  -1.548   0.1237    
## Current.Demerits    -0.0095496  0.0157630  -0.606   0.5455    
## Inspection.GradeA   -0.0654856  0.6841899  -0.096   0.9239    
## Inspection.GradeB   -0.2742315  0.7018883  -0.391   0.6966    
## Inspection.GradeC   -0.1997069  0.7555434  -0.264   0.7919    
## Inspection.GradeX   -0.3885972  0.8807551  -0.441   0.6597    
## Inspection.Demerits  0.0158040  0.0130400   1.212   0.2274    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6506 on 151 degrees of freedom
## Multiple R-squared:  0.1506, Adjusted R-squared:  0.08868 
## F-statistic: 2.433 on 11 and 151 DF,  p-value: 0.008076
hist(regstar$residuals, main = "Regression Residuals")

# normal probability plot
qqnorm(regstar$residuals)
qqline(regstar$residuals)

# constant variability
plot(regstar$residuals ~ Combined$review_count, main = "Variability")
abline(h = 0, lty = 3)

This multiple regression is not fit to be run. The residuals histogram is somewhat left skewed, the data along the qq plot is off the line and there exists a pattern when testing the residuals for constand variability. The regression itself only primarily the “reviews_count” variable to have any significant effect on the “stars” variable. So, we can run another regression with only that variable to take into account.

regstarrev = lm(stars ~ review_count, data = Combined)

summary(regstarrev)
## 
## Call:
## lm(formula = stars ~ review_count, data = Combined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8994 -0.4140  0.0768  0.5406  1.1117 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.3855991  0.0635339  53.288   <2e-16 ***
## review_count 0.0009172  0.0003159   2.904   0.0042 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6664 on 161 degrees of freedom
## Multiple R-squared:  0.04977,    Adjusted R-squared:  0.04387 
## F-statistic: 8.433 on 1 and 161 DF,  p-value: 0.004203
hist(regstarrev$residuals, main = "Regression Residuals")

# normal probability plot
qqnorm(regstarrev$residuals)
qqline(regstarrev$residuals)

# constant variability
plot(regstarrev$residuals ~ Combined$review_count, main = "Variability")
abline(h = 0, lty = 3)

This, too, was a failed attempt at explaining the star ratings. The R-squared of this regression showed that very little variability was explained this and the previous model.

Sentiment analysis

UserbyBusiness = UserbyBusiness[!duplicated(UserbyBusiness), ]
ReviewsGrades = merge(UserbyBusiness, Combined, by = "business_id")

groupings <- ReviewsGrades %>% select(business_id, Inspection.Grade) %>% filter(Inspection.Grade=='C') %>% group_by(business_id, Inspection.Grade) %>% summarise()
groupings$business_id
##  [1] 9f68hfHK6gNyHWmt9guiJw dblGQtHAzNy1ACis1m2shw dFmaYj3_cZpmEaL4WSOkPg
##  [4] eRQwgSQD1KyCmJk_PAGOXQ FvC3W9E6rXUYx-cTxbnUiA iRFaq249LVEFGeXLzPULnQ
##  [7] lOQPeguczAueJnLmDUTfGg MXC9pwIxovWUc9yu1F8OxA Oyo5CWthu2upxxaf69MfIg
## [10] PSRZaGGxXmOmabL2si8pKw q7LG_2s2urLJcD2IsxgTWg Qc-bULSBAyVh_qaKRWkF9A
## [13] So132GP_uy3XbGs0KNyzyw suQugTjSMYMIrK5fAabC_Q t3AWGr757V4Fvcun6sDsuw
## [16] T5UBAJt5Kod_wQGbH6m9hA w0ArxcJ0XsyK1ZwrIf3gpA zfNKdsuT7bWWN-f3v5Rj_g
## 5238 Levels: --q7kSBRb0vWC8lSkXFByA ... Zzl4ffV9kFY3yOSSgyRUBQ
#%>% rapply(function(x)length(unique(x))) 

# Sentiment analysis
reviews = as.list(as.character(ReviewsGrades$text))


analysis <- sqlQuery(conn, "Select * from YelpLVReviewTokens")
# using lexicon to score words
#lexicon = data.frame(get_sentiments("afinn"))
lexicon = read.csv("https://raw.githubusercontent.com/jmehta753/MSDA_JM/master/DATA607/Final%20Project/HygieneLexicon.csv")
af.sent = data.frame(merge(analysis, lexicon, by.x="word", by.y="word") %>% 
                      group_by(review)%>% 
                       mutate(mean.af.sent = mean(score)))

names(af.sent)[2] = "review_id"


# combining scores to business data
review.analysis = merge(af.sent, ReviewsGrades, by = "review_id")

Plots

MEAN SCORE BY STARS

These plots are separated into two sections: Star scores each user alloted to the restaurant and the overall score Yelp gave the restaurant based on the users’ reviews. It will be interesting to see what the connection is between these two variables and the mean sentiment score, in order to determine if, firstly, the sentiment analysis was somewhat accurate, and, second, to see if the overall rating of the business matches the sentiment scores.

# Users' stars for the restaurant
par(mfrow = c(1,2))
boxplot(mean.af.sent~ stars.x, data=review.analysis[!duplicated(review.analysis), ], 
        main="Mean Sentiment Score by Users' Star Scores",xlab="Users' Stars", ylab="Mean Sentiment Score")
abline(h = 0, lty = 3)
# Overall star rating for the restaurant
boxplot(mean.af.sent~ stars.y, data=review.analysis[!duplicated(review.analysis), ], 
        main="Mean Sentiment Score by Restaurants' Star Scores",xlab="Restaurant Stars", ylab="Mean Sentiment Score")
abline(h = 0, lty = 3)

MEAN SCORE BY GRADE

The boxplot is based on Current Inspection Grade. What would be interesting to note here is if there is a relationship between the inspection grades and the review sentiments (meaning, if the users’ opinion match or are related in some way to the inspection grades).

# Current Grade
boxplot(mean.af.sent~ Current.Grade, data=review.analysis[!duplicated(review.analysis), ], 
        main="Mean Sentiment Score by Current Grade",xlab="Current Grade", ylab="Mean Sentiment Score")

# Inspection Grade
boxplot(mean.af.sent~ Inspection.Grade, data=review.analysis[!duplicated(review.analysis), ], 
        main="Mean Sentiment Score by Inspection Grade",xlab="Current Grade", ylab="Mean Sentiment Score")

abline(h = 0, lty = 3)

STARS BY GRADE

The boxplot is based on Current Inspection Grade. What would be interesting to note here is if there is a relationship between the inspection grades and the review sentiments (meaning, if the users’ opinion match or are related in some way to the inspection grades).

boxplot(stars.x ~ Current.Grade, ReviewsGrades, 
        main="Users' Star Scores by Current Inspection Grades",xlab="Current Grade", ylab="Stars")
abline(h = 0, lty = 3)

WORDS BY USERS’ STAR SCORE

Plotting the words that the afinn lexicon determined as positive/negative

The detail to pay attention to here is not only the words that 5-star reviews earned, but whether the words for higher star scores are related to health/cleanliness. At first glance, the majority of the words have more to do with taste than hygiene.

# Plots of the words scores vs the users' star score for each restaurant.
ggplot(review.analysis, aes(score, stars.x, group = score)) + 
  geom_point(aes(colour = factor(score))) +
  geom_label_repel(aes(label=word)) +
  xlab("Word Score") +
  ylab("Users' Stars Scores")

Location Analysis through Visualization

We are interested in knowing how the restaurants are concentrated in Las Vegas and if we can spot where some of its great restaurants are located through visual analysis. It is well known that Las Vegas strip is home to multiple high quality restaurants but are there any other surprises?

Extract the Yelp Restaurants data from database

vegasFrame <- sqlQuery(conn, "select * from YelpLVBusiness")
head(vegasFrame)
##              business_id                          name neighborhood
## 1 3rkxTx8DoZSl7_FryhXCVQ      Jugoo Chinese Restaurant   University
## 2 xkVMIk_Vqh17f48ZQ_6b0w    Tom Colicchio's Craftsteak    The Strip
## 3 n8XlV8E3BUBBF1gCe6mvEg                 House of Vino   Centennial
## 4 2BbFeotL85cIaBjSq1SWiA Springs Cafe by Wolfgang Puck     Westside
## 5 w_A1qFeQ3vOl7o4kdBxPXA   Muse Asian Cuisine & Lounge    Chinatown
## 6 XlUuMGMvkzy_QLAUTEe-fw                 Dumpling King    Chinatown
##                              address      city state postal_code latitude
## 1               4747 S Maryland Pkwy Las Vegas    NV       89119 36.10413
## 2              3799 Las Vegas Blvd S Las Vegas    NV       89109 36.10292
## 3         8053 N Durango Dr, Ste 160 Las Vegas    NV       89131 36.30763
## 4             333 S Valley View Blvd Las Vegas    NV       89107 36.16810
## 5 5115 W Spring Mountain Rd, Ste 107 Las Vegas    NV       89146 36.12604
## 6            5740 Spring Mountain Rd Las Vegas    NV       89146 36.12608
##   longitude stars review_count is_open
## 1 -115.1377   3.5           34       0
## 2 -115.1698   4.0         1915       1
## 3 -115.2786   3.5           34       0
## 4 -115.1922   2.5           23       0
## 5 -115.2114   3.5           14       0
## 6 -115.2198   3.5          120       0
##                                                                                                                                                                                                                                                        attributes
## 1 c("Alcohol: none", "Ambience: {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True}", "BusinessAcceptsCreditCards: True", "BusinessParking: {'garage'
## 2 c("Alcohol: full_bar", "Ambience: {'romantic': False, 'intimate': False, 'classy': True, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': False}", "BYOB: False", "BYOBCorkage: no", "BikeParking: False", "Bu
## 3 c("Alcohol: full_bar", "Ambience: {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True}", "BikeParking: True", "BusinessAcceptsBitcoin: False", "Busi
## 4 c("Alcohol: beer_and_wine", "BusinessAcceptsCreditCards: True", "BusinessParking: {'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}", "GoodForKids: True", "GoodForMeal: {'dessert': False, 'latenight': False, 'lunch': True
## 5 c("Alcohol: full_bar", "Ambience: {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': False}", "BikeParking: True", "BusinessAcceptsBitcoin: False", "Bus
## 6 c("Alcohol: none", "Ambience: {'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': True, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': False}", "BikeParking: False", "BusinessAcceptsBitcoin: False", "Busines
##                                                                                                     categories
## 1                                                                                  c("Restaurants", "Chinese")
## 2                        c("Steakhouses", "Restaurants", "Cheesesteaks", "Food", "American (New)", "Desserts")
## 3                                       c("Restaurants", "Beer", "Wine & Spirits", "Delis", "Food", "Grocery")
## 4                                                                           c("Restaurants", "American (New)")
## 5 c("Karaoke", "Coffee & Tea", "Bars", "Restaurants", "Lounges", "Asian Fusion", "Nightlife", "Food", "Cafes")
## 6                                                  c("Restaurants", "Szechuan", "Chinese", "Halal", "Dim Sum")
##                                                                                                                                                     hours
## 1             c("Monday 10:30-21:30", "Tuesday 10:30-21:30", "Wednesday 10:30-21:30", "Thursday 10:30-21:30", "Friday 10:30-21:30", "Sunday 10:30-21:30")
## 2 c("Monday 17:0-22:0", "Tuesday 17:0-22:0", "Wednesday 17:0-22:0", "Thursday 17:0-22:0", "Friday 17:0-22:30", "Saturday 17:0-22:30", "Sunday 17:0-22:0")
## 3          c("Monday 9:0-21:0", "Tuesday 9:0-21:0", "Wednesday 9:0-21:0", "Thursday 9:0-21:0", "Friday 9:0-21:0", "Saturday 9:0-21:0", "Sunday 9:0-21:0")
## 4   c("Monday 11:0-15:0", "Tuesday 11:0-15:0", "Wednesday 11:0-15:0", "Thursday 11:0-15:0", "Friday 11:0-15:0", "Saturday 11:0-18:0", "Sunday 11:0-18:0")
## 5          c("Monday 19:0-4:0", "Tuesday 19:0-4:0", "Wednesday 19:0-4:0", "Thursday 19:0-4:0", "Friday 13:0-4:0", "Saturday 13:0-4:0", "Sunday 13:0-3:0")
## 6    c("Monday 11:0-0:0", "Tuesday 11:0-22:0", "Wednesday 11:0-22:0", "Thursday 11:0-22:0", "Friday 11:0-22:0", "Saturday 11:0-22:0", "Sunday 11:0-22:0")
##       type
## 1 business
## 2 business
## 3 business
## 4 business
## 5 business
## 6 business
vegasFrame %>% select(stars) %>% group_by(stars)
## Source: local data frame [5,433 x 1]
## Groups: stars [9]
## 
##    stars
## *  <dbl>
## 1    3.5
## 2    4.0
## 3    3.5
## 4    2.5
## 5    3.5
## 6    3.5
## 7    3.5
## 8    2.5
## 9    3.5
## 10   4.0
## # ... with 5,423 more rows
content <- function(name, address, neighbourhood) {
    paste(sep = "<br/>",
    "<b>", name,"</b>",
    address, neighbourhood
  )
}

Plot all the restaruants on the map to view the concentrations

colorDomain <- c(0.0,0.5,1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5)
pal <- colorNumeric("YlGn", colorDomain, na.color = "transparent")
leaflet(vegasFrame) %>% addTiles() %>% setView(lng = -115.1398, 
    lat = 36.1699, zoom = 12) %>% addCircles(lng = ~longitude, lat = ~latitude, 
    weight = 0, radius = ~exp(stars*1.2) , fillOpacity = 0.5,  
    color = ~pal(stars),
    popup = ~content(name, address, neighborhood)) 

Expectedly there are multiple areas of concentrations within las vegas.

Plot only the high ranked restaurants on the map

As our above map was bit crowded we use a different approach to plot the map by clustering the spots together.

vegasTopFrame = vegasFrame %>% filter(stars>=4.0)
leaflet(vegasTopFrame) %>% addTiles() %>% setView(lng = -115.1398, 
    lat = 36.1699, zoom = 12) %>% addCircleMarkers(lng = ~longitude, lat = ~latitude, 
    popup = ~content(name, address, neighborhood), clusterOptions = markerClusterOptions()) 

Conclusion :

While the Strip expectedly has highest number of high quality restaurants (at 268), it came as a surprise that Chinatown (with 166) and Downtown (with 157) has very large number of such places too.