by Davin Kaing

Introduction

The Yelp datasets consist of many reviews from many businesses, many of which are restaurants. This analysis explores the performance of four ethnic restaurants - Chinese, Greek, Italian, and Mexican restaurants. This is done by looking at the review counts, restaurant ratings, and review ratings. From these information, the top 10 cities and states with the highest ratings and greatest number of ethnic restaurants can be determined.

This analysis can be useful to both ethnic restaurant owners and Yelp users. For restaurant owners, the results presented in this report can be used to understand when the consumers are most active by looking at the amount of reviews with respect to time. In addition, the restaurant rating data can also help restaurant owners determine the performance of ethnic foods over time - a valuable information that can help with the future investments of restaurant owners. As of the users, the information about the highest ranked cities can be most valuable to food enthusiast.

Method and Data

Data Acquisition & Processing

The datasets used for this project are taken from the yelp dataset challenge. The following details the acquisition and processing of the datasets.

library("jsonlite")
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View
library(zoo)
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
YelpReview <- read.table("~/Desktop/Coursera/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json", quote="\"")

YelpBusiness <- stream_in(file("~/Desktop/Coursera/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"))
## opening file input connection.
## closing file input connection.
ExtractedReview <- YelpReview[, c(16,20,23,35)]
colnames(ExtractedReview) <- c("Review_ID", "Stars", "Date", "Business_ID")
ExtractedReview$Stars <- substr(ExtractedReview$Stars,1,1)

ProcessedBusiness <- YelpBusiness[,c("business_id", "name","state", 
                                     "categories", "city", "stars", 
                                     "review_count")]
colnames(ProcessedBusiness) <- c("Business_ID", "Business_Name", "State", 
                                 "Categories", "City", "Business_Stars",
                                 "Review_Count")

ExtractedReview$Date <- as.Date(ExtractedReview$Date)

The processed business and review data were merged together using the following command.

MergedData <- merge(ExtractedReview, ProcessedBusiness, by = "Business_ID")
MergedData <- MergedData[,-c(1,2)]

Exploratory Data Analysis

The follow code was used to explore the various categories in the dataset. From these categories, the follow ethnic restaurants are determined: Chinese, Greek, Italian, and Mexican restaurants.

CategoriesCounts <- data.frame(table(unlist(head(ProcessedBusiness$Categories, n = 1000))))
OrderedCat <- CategoriesCounts[order(CategoriesCounts$Freq, decreasing = T),]
head(OrderedCat)
##                          Var1 Freq
## 255               Restaurants  395
## 264                  Shopping  155
## 119                      Food  117
## 208                 Nightlife   79
## 32                       Bars   66
## 106 Event Planning & Services   64

Data Extraction

After identifying the ethnic restaurants, the data of these restaurants are extracted and combined into one large dataset - cultural_data.

restaurant_data <- MergedData[grepl("Restaurant", MergedData$Categories), ]
Chinese_restaurant <- restaurant_data[grepl("Chinese", restaurant_data$Categories),]
Chinese_restaurant$Type <- "Chinese"
Greek_restaurant <- restaurant_data[grepl("Greek", restaurant_data$Categories),]
Greek_restaurant$Type <- "Greek"
Italian_restaurant <- restaurant_data[grepl("Italian", restaurant_data$Categories),]
Italian_restaurant$Type <- "Italian"
Mexican_restaurant <- restaurant_data[grepl("Mexican", restaurant_data$Categories),]
Mexican_restaurant$Type <- "Mexican"
cultural_data <- rbind(Chinese_restaurant, Greek_restaurant, Italian_restaurant, Mexican_restaurant)

cultural_data$Year <- format(cultural_data$Date, "%Y")
cultural_data$Month <- format(cultural_data$Date, "%m")
cultural_data$Day <- format(cultural_data$Date, "%d")
cultural_data$Stars <- as.numeric(cultural_data$Stars)
cultural_data$Freq <- 1

head(cultural_data)
##     Stars       Date         Business_Name State
## 155     4 2014-08-18      Dim Sum Montreal    QC
## 156     5 2014-08-26      Dim Sum Montreal    QC
## 157     5 2014-10-04      Dim Sum Montreal    QC
## 158     2 2014-11-04      Dim Sum Montreal    QC
## 486     3 2010-04-28 Red Bowl Asian Bistro    NC
## 487     4 2010-05-03 Red Bowl Asian Bistro    NC
##                                                  Categories      City
## 155 Food, Street Vendors, Chinese, Food Trucks, Restaurants  Montréal
## 156 Food, Street Vendors, Chinese, Food Trucks, Restaurants  Montréal
## 157 Food, Street Vendors, Chinese, Food Trucks, Restaurants  Montréal
## 158 Food, Street Vendors, Chinese, Food Trucks, Restaurants  Montréal
## 486                Thai, Asian Fusion, Chinese, Restaurants Charlotte
## 487                Thai, Asian Fusion, Chinese, Restaurants Charlotte
##     Business_Stars Review_Count    Type Year Month Day Freq
## 155            4.0            4 Chinese 2014    08  18    1
## 156            4.0            4 Chinese 2014    08  26    1
## 157            4.0            4 Chinese 2014    10  04    1
## 158            4.0            4 Chinese 2014    11  04    1
## 486            3.5           37 Chinese 2010    04  28    1
## 487            3.5           37 Chinese 2010    05  03    1

The following details the method used to determine the rating and review counts of the restaurant reviews. The aggregate command is used along with the functions: mean (for performance) and sum (for review counts).

performance_summary <- aggregate(Stars~Type+Year+Month, data = cultural_data, mean)
performance_summary$x <- paste(performance_summary$Year, "-", performance_summary$Month, sep = "")
performance_summary$Time <- as.Date(as.yearmon(performance_summary$x))

review_count_summary <- aggregate(Review_Count~Type+Year+Month, data = cultural_data, sum)
review_count_summary$x <- paste(review_count_summary$Year, "-", review_count_summary$Month, sep = "")
review_count_summary$Time <- as.Date(as.yearmon(review_count_summary$x))

The top 10 city and states with the highest rating for ethnic foods are determined using the following code. The methodology for this is to determine the average rating of the “Business Stars” variable with respet to the type of restaurant, the year of review, city and states of the restaurants.

performance_by_city <- aggregate(Business_Stars ~ Type + Year+City+State, data = cultural_data, mean)
performance_by_city_year2014 <- performance_by_city[performance_by_city$Year == 2014,]


city_year2014_chinese <- performance_by_city_year2014[performance_by_city_year2014$Type == "Chinese", ]
sort_2014_chinese <- city_year2014_chinese[order(city_year2014_chinese$Business_Stars, decreasing = TRUE), ]
Rank <- 1:10
Top10_city_state_chinese <- data.frame(cbind(Rank, sort_2014_chinese$City[1:10], 
                                             sort_2014_chinese$State[1:10], sort_2014_chinese$Business_Stars[1:10]))
Top10_city_state_chinese[,4] <- round(as.numeric(as.character(Top10_city_state_chinese[,4])),2)
colnames(Top10_city_state_chinese) <- c("Ranking", "City", "State", "Business Star")


### Greek Foods
city_year2014_greek <- performance_by_city_year2014[performance_by_city_year2014$Type == "Greek", ]
sort_2014_greek<- city_year2014_greek[order(city_year2014_greek$Business_Stars, decreasing = TRUE), ]
Top10_city_state_greek <- data.frame(cbind(Rank, sort_2014_greek$City[1:10], 
                                             sort_2014_greek$State[1:10], sort_2014_greek$Business_Stars[1:10]))
Top10_city_state_greek[,4] <- round(as.numeric(as.character(Top10_city_state_greek[,4])),2)
colnames(Top10_city_state_greek) <- c("Ranking", "City", "State", "Business Star")


### Italian Foods
city_year2014_italian <- performance_by_city_year2014[performance_by_city_year2014$Type == "Italian", ]
sort_2014_italian<- city_year2014_italian[order(city_year2014_italian$Business_Stars, decreasing = TRUE), ]
Top10_city_state_italian <- data.frame(cbind(Rank, sort_2014_italian$City[1:10], 
                                           sort_2014_italian$State[1:10], sort_2014_italian$Business_Stars[1:10]))
Top10_city_state_italian[,4] <- round(as.numeric(as.character(Top10_city_state_italian[,4])),2)
colnames(Top10_city_state_italian) <- c("Ranking", "City", "State", "Business Star")


### Mexican Foods
city_year2014_mexican <- performance_by_city_year2014[performance_by_city_year2014$Type == "Mexican", ]
sort_2014_mexican<- city_year2014_mexican[order(city_year2014_mexican$Business_Stars, decreasing = TRUE), ]
Top10_city_state_mexican <- data.frame(cbind(Rank, sort_2014_mexican$City[1:10], 
                                             sort_2014_mexican$State[1:10], sort_2014_mexican$Business_Stars[1:10]))
Top10_city_state_mexican[,4] <- round(as.numeric(as.character(Top10_city_state_mexican[,4])),2)
colnames(Top10_city_state_mexican) <- c("Ranking", "City", "State", "Business Star")

Here, the frequencies of the four types of ethnic restaurants are computed. As a result, we can see the top cities and states with the most restaurants for the following category: Chinese, Greek, Italian, Mexican. This result allows us to see which cities have the most ethnic restaurants.

## City with Most Ethnic Restaurants

restaurant_count <- aggregate(Freq ~ Business_Name+State+City+Type+Year, data = cultural_data, sum)
restaurant_count_business <- aggregate(Freq~Type+City+State+Year, data = restaurant_count, sum)
order_restaurant <- restaurant_count_business[order(restaurant_count_business$Freq, decreasing = TRUE),]
order_restaurant <- order_restaurant[order_restaurant$Year == 2014,]

top_10_freq_restaurant_chinese <- order_restaurant[order_restaurant$Type == "Chinese", ][1:10,]
row.names(top_10_freq_restaurant_chinese) <- NULL


top_10_freq_restaurant_greek <- order_restaurant[order_restaurant$Type == "Greek",][1:10,]
row.names(top_10_freq_restaurant_greek) <- NULL


top_10_freq_restaurant_italian <- order_restaurant[order_restaurant$Type == "Italian",][1:10,]
row.names(top_10_freq_restaurant_italian) <- NULL


top_10_freq_restaurant_mexican <- order_restaurant[order_restaurant$Type == "Mexican",][1:10,]
row.names(top_10_freq_restaurant_mexican) <- NULL

Results

Review Counts by Month for Four Ethic Restaurants

library(ggplot2)
ggplot(cultural_data, aes(Month, fill = Year)) + geom_histogram() + 
        ggtitle("Review Counts by Month for All Ethnic Foods") + 
        labs(x = "Month", y = "Review Counts") +
        theme(text = element_text(size = 13))

Performance of Four Ethnic Foods Over Time

ggplot(performance_summary, aes(x = Time, y = Stars, group = Type, color = Type)) + 
        geom_point() + facet_grid(.~Type) + stat_smooth(method = "lm") + scale_x_date() +
        ggtitle("Performance of Ethnic Foods Over Time") + labs(x = "Time (Year)", y = "Stars (1-5)") +
        theme(text = element_text(size = 13), axis.text.x = element_text(angle = 90, vjust =1 ))

Review Counts for Ethnic Foods Over Time

ggplot(review_count_summary, aes(x = Time, y = Review_Count, group = Type, color = Type)) +
        geom_point() +facet_grid(.~Type) + stat_smooth(method = "lm") + scale_x_date()+
        ggtitle("Review Counts of Ethnic Foods Over Time") +labs(x = "Time (Year)", y = "Review Counts")+
        theme(text = element_text(size = 13), axis.text.x = element_text(angle = 90, vjust =1 ))

Top 10 City and State with Highest Ranked Ethnic Foods in 2014

Top10_city_state_chinese
##    Ranking                                City State Business Star
## 1        1                        West Mifflin    PA          5.00
## 2        2                            DeForest    WI          5.00
## 3        3                           Edinburgh   MLN          4.75
## 4        4                            Dalkeith   MLN          4.50
## 5        5                          Harrisburg    NC          4.50
## 6        6 Cote-des-Neiges-Notre-Dame-de-Grace    QC          4.50
## 7        7                       Pointe-Claire    QC          4.50
## 8        8                     Apache Junction    AZ          4.34
## 9        9                              Monona    WI          4.06
## 10      10                           Karlsruhe    BW          4.03
Top10_city_state_greek
##    Ranking                     City State Business Star
## 1        1           San Tan Valley    AZ          4.50
## 2        2                 Sun City    AZ          4.50
## 3        3 Eggenstein-Leopoldshafen    BW          4.50
## 4        4                Stutensee    BW          4.50
## 5        5            Wesley Chapel    NC          4.50
## 6        6              Sun Prairie    WI          4.50
## 7        7                   Peoria    AZ          4.41
## 8        8                Henderson    NV          4.40
## 9        9                 Avondale    AZ          4.37
## 10      10            Mount Lebanon    PA          4.27
Top10_city_state_italian
##    Ranking                   City State Business Star
## 1        1     Stutensee neuthard    BW          5.00
## 2        2               Florence    AZ          4.72
## 3        3              Stutensee    BW          4.50
## 4        4 Bonnyrigg and Lasswade   MLN          4.50
## 5        5             Sharpsburg    PA          4.50
## 6        6    Pointe-Aux-Trembles    QC          4.50
## 7        7               Rosemère    QC          4.50
## 8        8         Sainte-Thérèse    QC          4.50
## 9        9              McFarland    WI          4.50
## 10      10            Sun Prairie    WI          4.23
Top10_city_state_mexican
##    Ranking         City State Business Star
## 1        1    Edinburgh   MLN          5.00
## 2        2 Boulder City    NV          4.79
## 3        3  Bridgeville    PA          4.50
## 4        4     DeForest    WI          4.50
## 5        5 Indian Trail    NC          4.49
## 6        6    Gila Bend    AZ          4.18
## 7        7     Coolidge    AZ          4.11
## 8        8    Guadalupe    AZ          4.00
## 9        9      Belmont    NC          4.00
## 10      10      Concord    NC          4.00

City with Most Ethnic Restaurants

top_10_freq_restaurant_chinese
##       Type       City State Year Freq
## 1  Chinese  Las Vegas    NV 2014 5715
## 2  Chinese    Phoenix    AZ 2014 2117
## 3  Chinese   Chandler    AZ 2014  806
## 4  Chinese Pittsburgh    PA 2014  727
## 5  Chinese  Charlotte    NC 2014  606
## 6  Chinese Scottsdale    AZ 2014  588
## 7  Chinese       Mesa    AZ 2014  553
## 8  Chinese  Henderson    NV 2014  523
## 9  Chinese    Madison    WI 2014  456
## 10 Chinese      Tempe    AZ 2014  406
top_10_freq_restaurant_greek
##     Type       City State Year Freq
## 1  Greek  Las Vegas    NV 2014  945
## 2  Greek    Phoenix    AZ 2014  618
## 3  Greek      Tempe    AZ 2014  399
## 4  Greek  Charlotte    NC 2014  298
## 5  Greek  Henderson    NV 2014  274
## 6  Greek Scottsdale    AZ 2014  223
## 7  Greek   Chandler    AZ 2014  204
## 8  Greek   Avondale    AZ 2014  122
## 9  Greek       Mesa    AZ 2014  118
## 10 Greek Pittsburgh    PA 2014  115
top_10_freq_restaurant_italian
##       Type       City State Year Freq
## 1  Italian  Las Vegas    NV 2014 8559
## 2  Italian    Phoenix    AZ 2014 4637
## 3  Italian Scottsdale    AZ 2014 2428
## 4  Italian  Charlotte    NC 2014 1276
## 5  Italian Pittsburgh    PA 2014 1149
## 6  Italian  Henderson    NV 2014  812
## 7  Italian      Tempe    AZ 2014  684
## 8  Italian    Gilbert    AZ 2014  581
## 9  Italian    Madison    WI 2014  564
## 10 Italian       Mesa    AZ 2014  467
top_10_freq_restaurant_mexican
##       Type       City State Year Freq
## 1  Mexican  Las Vegas    NV 2014 9736
## 2  Mexican    Phoenix    AZ 2014 6746
## 3  Mexican Scottsdale    AZ 2014 2671
## 4  Mexican  Charlotte    NC 2014 1426
## 5  Mexican       Mesa    AZ 2014 1346
## 6  Mexican      Tempe    AZ 2014 1191
## 7  Mexican    Gilbert    AZ 2014 1155
## 8  Mexican   Chandler    AZ 2014 1125
## 9  Mexican  Henderson    NV 2014 1038
## 10 Mexican Pittsburgh    PA 2014  774

Discussion

The above analysis provide useful information about the performance of four ethnic restaurants: Chinese, Greek, Italian, and Mexian restaurants. The months that have the most review for all four restaurants are July and August. For their preformance, Greek restaurants tend to perform better with respect to time, while the performances of the other resturants are decreasing with respect to time. Howevever, the review counts for Greek retaurants increased gradually in comparison to the other restaurants. In addition, it has the lowest number of review counts. As of the top 10 cities and states with the highest ranked restaurants, the ranking of the cities differ according to the types of restaurant. On the other hand, the city and state with the most ethnic restaurants (Chinese, Greek, Italian, and Mexican) are Las Vegas, Nevada.

The information above can be useful to both Yelp users and individuals interested in opening one of the following restaurants: Chinese, Greek, Italian, and Mexican restaurants. It allows them to understand the location where these restaurants are thriving and the activities of restaurant ratings (the most active reviewers).