by Davin Kaing
The Yelp datasets consist of many reviews from many businesses, many of which are restaurants. This analysis explores the performance of four ethnic restaurants - Chinese, Greek, Italian, and Mexican restaurants. This is done by looking at the review counts, restaurant ratings, and review ratings. From these information, the top 10 cities and states with the highest ratings and greatest number of ethnic restaurants can be determined.
This analysis can be useful to both ethnic restaurant owners and Yelp users. For restaurant owners, the results presented in this report can be used to understand when the consumers are most active by looking at the amount of reviews with respect to time. In addition, the restaurant rating data can also help restaurant owners determine the performance of ethnic foods over time - a valuable information that can help with the future investments of restaurant owners. As of the users, the information about the highest ranked cities can be most valuable to food enthusiast.
The datasets used for this project are taken from the yelp dataset challenge. The following details the acquisition and processing of the datasets.
library("jsonlite")
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:utils':
##
## View
library(zoo)
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
YelpReview <- read.table("~/Desktop/Coursera/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json", quote="\"")
YelpBusiness <- stream_in(file("~/Desktop/Coursera/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"))
## opening file input connection.
## closing file input connection.
ExtractedReview <- YelpReview[, c(16,20,23,35)]
colnames(ExtractedReview) <- c("Review_ID", "Stars", "Date", "Business_ID")
ExtractedReview$Stars <- substr(ExtractedReview$Stars,1,1)
ProcessedBusiness <- YelpBusiness[,c("business_id", "name","state",
"categories", "city", "stars",
"review_count")]
colnames(ProcessedBusiness) <- c("Business_ID", "Business_Name", "State",
"Categories", "City", "Business_Stars",
"Review_Count")
ExtractedReview$Date <- as.Date(ExtractedReview$Date)
The processed business and review data were merged together using the following command.
MergedData <- merge(ExtractedReview, ProcessedBusiness, by = "Business_ID")
MergedData <- MergedData[,-c(1,2)]
The follow code was used to explore the various categories in the dataset. From these categories, the follow ethnic restaurants are determined: Chinese, Greek, Italian, and Mexican restaurants.
CategoriesCounts <- data.frame(table(unlist(head(ProcessedBusiness$Categories, n = 1000))))
OrderedCat <- CategoriesCounts[order(CategoriesCounts$Freq, decreasing = T),]
head(OrderedCat)
## Var1 Freq
## 255 Restaurants 395
## 264 Shopping 155
## 119 Food 117
## 208 Nightlife 79
## 32 Bars 66
## 106 Event Planning & Services 64
After identifying the ethnic restaurants, the data of these restaurants are extracted and combined into one large dataset - cultural_data.
restaurant_data <- MergedData[grepl("Restaurant", MergedData$Categories), ]
Chinese_restaurant <- restaurant_data[grepl("Chinese", restaurant_data$Categories),]
Chinese_restaurant$Type <- "Chinese"
Greek_restaurant <- restaurant_data[grepl("Greek", restaurant_data$Categories),]
Greek_restaurant$Type <- "Greek"
Italian_restaurant <- restaurant_data[grepl("Italian", restaurant_data$Categories),]
Italian_restaurant$Type <- "Italian"
Mexican_restaurant <- restaurant_data[grepl("Mexican", restaurant_data$Categories),]
Mexican_restaurant$Type <- "Mexican"
cultural_data <- rbind(Chinese_restaurant, Greek_restaurant, Italian_restaurant, Mexican_restaurant)
cultural_data$Year <- format(cultural_data$Date, "%Y")
cultural_data$Month <- format(cultural_data$Date, "%m")
cultural_data$Day <- format(cultural_data$Date, "%d")
cultural_data$Stars <- as.numeric(cultural_data$Stars)
cultural_data$Freq <- 1
head(cultural_data)
## Stars Date Business_Name State
## 155 4 2014-08-18 Dim Sum Montreal QC
## 156 5 2014-08-26 Dim Sum Montreal QC
## 157 5 2014-10-04 Dim Sum Montreal QC
## 158 2 2014-11-04 Dim Sum Montreal QC
## 486 3 2010-04-28 Red Bowl Asian Bistro NC
## 487 4 2010-05-03 Red Bowl Asian Bistro NC
## Categories City
## 155 Food, Street Vendors, Chinese, Food Trucks, Restaurants Montréal
## 156 Food, Street Vendors, Chinese, Food Trucks, Restaurants Montréal
## 157 Food, Street Vendors, Chinese, Food Trucks, Restaurants Montréal
## 158 Food, Street Vendors, Chinese, Food Trucks, Restaurants Montréal
## 486 Thai, Asian Fusion, Chinese, Restaurants Charlotte
## 487 Thai, Asian Fusion, Chinese, Restaurants Charlotte
## Business_Stars Review_Count Type Year Month Day Freq
## 155 4.0 4 Chinese 2014 08 18 1
## 156 4.0 4 Chinese 2014 08 26 1
## 157 4.0 4 Chinese 2014 10 04 1
## 158 4.0 4 Chinese 2014 11 04 1
## 486 3.5 37 Chinese 2010 04 28 1
## 487 3.5 37 Chinese 2010 05 03 1
The following details the method used to determine the rating and review counts of the restaurant reviews. The aggregate command is used along with the functions: mean (for performance) and sum (for review counts).
performance_summary <- aggregate(Stars~Type+Year+Month, data = cultural_data, mean)
performance_summary$x <- paste(performance_summary$Year, "-", performance_summary$Month, sep = "")
performance_summary$Time <- as.Date(as.yearmon(performance_summary$x))
review_count_summary <- aggregate(Review_Count~Type+Year+Month, data = cultural_data, sum)
review_count_summary$x <- paste(review_count_summary$Year, "-", review_count_summary$Month, sep = "")
review_count_summary$Time <- as.Date(as.yearmon(review_count_summary$x))
The top 10 city and states with the highest rating for ethnic foods are determined using the following code. The methodology for this is to determine the average rating of the “Business Stars” variable with respet to the type of restaurant, the year of review, city and states of the restaurants.
performance_by_city <- aggregate(Business_Stars ~ Type + Year+City+State, data = cultural_data, mean)
performance_by_city_year2014 <- performance_by_city[performance_by_city$Year == 2014,]
city_year2014_chinese <- performance_by_city_year2014[performance_by_city_year2014$Type == "Chinese", ]
sort_2014_chinese <- city_year2014_chinese[order(city_year2014_chinese$Business_Stars, decreasing = TRUE), ]
Rank <- 1:10
Top10_city_state_chinese <- data.frame(cbind(Rank, sort_2014_chinese$City[1:10],
sort_2014_chinese$State[1:10], sort_2014_chinese$Business_Stars[1:10]))
Top10_city_state_chinese[,4] <- round(as.numeric(as.character(Top10_city_state_chinese[,4])),2)
colnames(Top10_city_state_chinese) <- c("Ranking", "City", "State", "Business Star")
### Greek Foods
city_year2014_greek <- performance_by_city_year2014[performance_by_city_year2014$Type == "Greek", ]
sort_2014_greek<- city_year2014_greek[order(city_year2014_greek$Business_Stars, decreasing = TRUE), ]
Top10_city_state_greek <- data.frame(cbind(Rank, sort_2014_greek$City[1:10],
sort_2014_greek$State[1:10], sort_2014_greek$Business_Stars[1:10]))
Top10_city_state_greek[,4] <- round(as.numeric(as.character(Top10_city_state_greek[,4])),2)
colnames(Top10_city_state_greek) <- c("Ranking", "City", "State", "Business Star")
### Italian Foods
city_year2014_italian <- performance_by_city_year2014[performance_by_city_year2014$Type == "Italian", ]
sort_2014_italian<- city_year2014_italian[order(city_year2014_italian$Business_Stars, decreasing = TRUE), ]
Top10_city_state_italian <- data.frame(cbind(Rank, sort_2014_italian$City[1:10],
sort_2014_italian$State[1:10], sort_2014_italian$Business_Stars[1:10]))
Top10_city_state_italian[,4] <- round(as.numeric(as.character(Top10_city_state_italian[,4])),2)
colnames(Top10_city_state_italian) <- c("Ranking", "City", "State", "Business Star")
### Mexican Foods
city_year2014_mexican <- performance_by_city_year2014[performance_by_city_year2014$Type == "Mexican", ]
sort_2014_mexican<- city_year2014_mexican[order(city_year2014_mexican$Business_Stars, decreasing = TRUE), ]
Top10_city_state_mexican <- data.frame(cbind(Rank, sort_2014_mexican$City[1:10],
sort_2014_mexican$State[1:10], sort_2014_mexican$Business_Stars[1:10]))
Top10_city_state_mexican[,4] <- round(as.numeric(as.character(Top10_city_state_mexican[,4])),2)
colnames(Top10_city_state_mexican) <- c("Ranking", "City", "State", "Business Star")
Here, the frequencies of the four types of ethnic restaurants are computed. As a result, we can see the top cities and states with the most restaurants for the following category: Chinese, Greek, Italian, Mexican. This result allows us to see which cities have the most ethnic restaurants.
## City with Most Ethnic Restaurants
restaurant_count <- aggregate(Freq ~ Business_Name+State+City+Type+Year, data = cultural_data, sum)
restaurant_count_business <- aggregate(Freq~Type+City+State+Year, data = restaurant_count, sum)
order_restaurant <- restaurant_count_business[order(restaurant_count_business$Freq, decreasing = TRUE),]
order_restaurant <- order_restaurant[order_restaurant$Year == 2014,]
top_10_freq_restaurant_chinese <- order_restaurant[order_restaurant$Type == "Chinese", ][1:10,]
row.names(top_10_freq_restaurant_chinese) <- NULL
top_10_freq_restaurant_greek <- order_restaurant[order_restaurant$Type == "Greek",][1:10,]
row.names(top_10_freq_restaurant_greek) <- NULL
top_10_freq_restaurant_italian <- order_restaurant[order_restaurant$Type == "Italian",][1:10,]
row.names(top_10_freq_restaurant_italian) <- NULL
top_10_freq_restaurant_mexican <- order_restaurant[order_restaurant$Type == "Mexican",][1:10,]
row.names(top_10_freq_restaurant_mexican) <- NULL
library(ggplot2)
ggplot(cultural_data, aes(Month, fill = Year)) + geom_histogram() +
ggtitle("Review Counts by Month for All Ethnic Foods") +
labs(x = "Month", y = "Review Counts") +
theme(text = element_text(size = 13))
ggplot(performance_summary, aes(x = Time, y = Stars, group = Type, color = Type)) +
geom_point() + facet_grid(.~Type) + stat_smooth(method = "lm") + scale_x_date() +
ggtitle("Performance of Ethnic Foods Over Time") + labs(x = "Time (Year)", y = "Stars (1-5)") +
theme(text = element_text(size = 13), axis.text.x = element_text(angle = 90, vjust =1 ))
ggplot(review_count_summary, aes(x = Time, y = Review_Count, group = Type, color = Type)) +
geom_point() +facet_grid(.~Type) + stat_smooth(method = "lm") + scale_x_date()+
ggtitle("Review Counts of Ethnic Foods Over Time") +labs(x = "Time (Year)", y = "Review Counts")+
theme(text = element_text(size = 13), axis.text.x = element_text(angle = 90, vjust =1 ))
Top10_city_state_chinese
## Ranking City State Business Star
## 1 1 West Mifflin PA 5.00
## 2 2 DeForest WI 5.00
## 3 3 Edinburgh MLN 4.75
## 4 4 Dalkeith MLN 4.50
## 5 5 Harrisburg NC 4.50
## 6 6 Cote-des-Neiges-Notre-Dame-de-Grace QC 4.50
## 7 7 Pointe-Claire QC 4.50
## 8 8 Apache Junction AZ 4.34
## 9 9 Monona WI 4.06
## 10 10 Karlsruhe BW 4.03
Top10_city_state_greek
## Ranking City State Business Star
## 1 1 San Tan Valley AZ 4.50
## 2 2 Sun City AZ 4.50
## 3 3 Eggenstein-Leopoldshafen BW 4.50
## 4 4 Stutensee BW 4.50
## 5 5 Wesley Chapel NC 4.50
## 6 6 Sun Prairie WI 4.50
## 7 7 Peoria AZ 4.41
## 8 8 Henderson NV 4.40
## 9 9 Avondale AZ 4.37
## 10 10 Mount Lebanon PA 4.27
Top10_city_state_italian
## Ranking City State Business Star
## 1 1 Stutensee neuthard BW 5.00
## 2 2 Florence AZ 4.72
## 3 3 Stutensee BW 4.50
## 4 4 Bonnyrigg and Lasswade MLN 4.50
## 5 5 Sharpsburg PA 4.50
## 6 6 Pointe-Aux-Trembles QC 4.50
## 7 7 Rosemère QC 4.50
## 8 8 Sainte-Thérèse QC 4.50
## 9 9 McFarland WI 4.50
## 10 10 Sun Prairie WI 4.23
Top10_city_state_mexican
## Ranking City State Business Star
## 1 1 Edinburgh MLN 5.00
## 2 2 Boulder City NV 4.79
## 3 3 Bridgeville PA 4.50
## 4 4 DeForest WI 4.50
## 5 5 Indian Trail NC 4.49
## 6 6 Gila Bend AZ 4.18
## 7 7 Coolidge AZ 4.11
## 8 8 Guadalupe AZ 4.00
## 9 9 Belmont NC 4.00
## 10 10 Concord NC 4.00
top_10_freq_restaurant_chinese
## Type City State Year Freq
## 1 Chinese Las Vegas NV 2014 5715
## 2 Chinese Phoenix AZ 2014 2117
## 3 Chinese Chandler AZ 2014 806
## 4 Chinese Pittsburgh PA 2014 727
## 5 Chinese Charlotte NC 2014 606
## 6 Chinese Scottsdale AZ 2014 588
## 7 Chinese Mesa AZ 2014 553
## 8 Chinese Henderson NV 2014 523
## 9 Chinese Madison WI 2014 456
## 10 Chinese Tempe AZ 2014 406
top_10_freq_restaurant_greek
## Type City State Year Freq
## 1 Greek Las Vegas NV 2014 945
## 2 Greek Phoenix AZ 2014 618
## 3 Greek Tempe AZ 2014 399
## 4 Greek Charlotte NC 2014 298
## 5 Greek Henderson NV 2014 274
## 6 Greek Scottsdale AZ 2014 223
## 7 Greek Chandler AZ 2014 204
## 8 Greek Avondale AZ 2014 122
## 9 Greek Mesa AZ 2014 118
## 10 Greek Pittsburgh PA 2014 115
top_10_freq_restaurant_italian
## Type City State Year Freq
## 1 Italian Las Vegas NV 2014 8559
## 2 Italian Phoenix AZ 2014 4637
## 3 Italian Scottsdale AZ 2014 2428
## 4 Italian Charlotte NC 2014 1276
## 5 Italian Pittsburgh PA 2014 1149
## 6 Italian Henderson NV 2014 812
## 7 Italian Tempe AZ 2014 684
## 8 Italian Gilbert AZ 2014 581
## 9 Italian Madison WI 2014 564
## 10 Italian Mesa AZ 2014 467
top_10_freq_restaurant_mexican
## Type City State Year Freq
## 1 Mexican Las Vegas NV 2014 9736
## 2 Mexican Phoenix AZ 2014 6746
## 3 Mexican Scottsdale AZ 2014 2671
## 4 Mexican Charlotte NC 2014 1426
## 5 Mexican Mesa AZ 2014 1346
## 6 Mexican Tempe AZ 2014 1191
## 7 Mexican Gilbert AZ 2014 1155
## 8 Mexican Chandler AZ 2014 1125
## 9 Mexican Henderson NV 2014 1038
## 10 Mexican Pittsburgh PA 2014 774
The above analysis provide useful information about the performance of four ethnic restaurants: Chinese, Greek, Italian, and Mexian restaurants. The months that have the most review for all four restaurants are July and August. For their preformance, Greek restaurants tend to perform better with respect to time, while the performances of the other resturants are decreasing with respect to time. Howevever, the review counts for Greek retaurants increased gradually in comparison to the other restaurants. In addition, it has the lowest number of review counts. As of the top 10 cities and states with the highest ranked restaurants, the ranking of the cities differ according to the types of restaurant. On the other hand, the city and state with the most ethnic restaurants (Chinese, Greek, Italian, and Mexican) are Las Vegas, Nevada.
The information above can be useful to both Yelp users and individuals interested in opening one of the following restaurants: Chinese, Greek, Italian, and Mexican restaurants. It allows them to understand the location where these restaurants are thriving and the activities of restaurant ratings (the most active reviewers).