Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus and user-reviews of restaurants as well as food delivery options from partner restaurants in select cities. Zomato also began grocery delivery amid the COVID-19 outbreak. As of 2019, the service is available in 24 countries and in more than 10,000 cities.
library(ggplot2)
library(ggthemes)
library(scales)
library(tidyr)
library(ggrepel)zomato <- read.csv("data_input/zomato.csv")dim(zomato)## [1] 9551 21
head(zomato)tail(zomato)str(zomato)## 'data.frame': 9551 obs. of 21 variables:
## $ Restaurant.ID : int 6317637 6304287 6300002 6318506 6314302 18189371 6300781 6301290 6300010 6314987 ...
## $ Restaurant.Name : chr "Le Petit Souffle" "Izakaya Kikufuji" "Heat - Edsa Shangri-La" "Ooma" ...
## $ Country.Code : int 162 162 162 162 162 162 162 162 162 162 ...
## $ City : chr "Makati City" "Makati City" "Mandaluyong City" "Mandaluyong City" ...
## $ Address : chr "Third Floor, Century City Mall, Kalayaan Avenue, Poblacion, Makati City" "Little Tokyo, 2277 Chino Roces Avenue, Legaspi Village, Makati City" "Edsa Shangri-La, 1 Garden Way, Ortigas, Mandaluyong City" "Third Floor, Mega Fashion Hall, SM Megamall, Ortigas, Mandaluyong City" ...
## $ Locality : chr "Century City Mall, Poblacion, Makati City" "Little Tokyo, Legaspi Village, Makati City" "Edsa Shangri-La, Ortigas, Mandaluyong City" "SM Megamall, Ortigas, Mandaluyong City" ...
## $ Locality.Verbose : chr "Century City Mall, Poblacion, Makati City, Makati City" "Little Tokyo, Legaspi Village, Makati City, Makati City" "Edsa Shangri-La, Ortigas, Mandaluyong City, Mandaluyong City" "SM Megamall, Ortigas, Mandaluyong City, Mandaluyong City" ...
## $ Longitude : num 121 121 121 121 121 ...
## $ Latitude : num 14.6 14.6 14.6 14.6 14.6 ...
## $ Cuisines : chr "French, Japanese, Desserts" "Japanese" "Seafood, Asian, Filipino, Indian" "Japanese, Sushi" ...
## $ Average.Cost.for.two: int 1100 1200 4000 1500 1500 1000 2000 2000 6000 1100 ...
## $ Currency : chr "Botswana Pula(P)" "Botswana Pula(P)" "Botswana Pula(P)" "Botswana Pula(P)" ...
## $ Has.Table.booking : chr "Yes" "Yes" "Yes" "No" ...
## $ Has.Online.delivery : chr "No" "No" "No" "No" ...
## $ Is.delivering.now : chr "No" "No" "No" "No" ...
## $ Switch.to.order.menu: chr "No" "No" "No" "No" ...
## $ Price.range : int 3 3 4 4 4 3 4 4 4 3 ...
## $ Aggregate.rating : num 4.8 4.5 4.4 4.9 4.8 4.4 4 4.2 4.9 4.8 ...
## $ Rating.color : chr "Dark Green" "Dark Green" "Green" "Dark Green" ...
## $ Rating.text : chr "Excellent" "Excellent" "Very Good" "Excellent" ...
## $ Votes : int 314 591 270 365 229 336 520 677 621 532 ...
anyNA(zomato)## [1] FALSE
colSums(is.na(zomato))## Restaurant.ID Restaurant.Name Country.Code
## 0 0 0
## City Address Locality
## 0 0 0
## Locality.Verbose Longitude Latitude
## 0 0 0
## Cuisines Average.Cost.for.two Currency
## 0 0 0
## Has.Table.booking Has.Online.delivery Is.delivering.now
## 0 0 0
## Switch.to.order.menu Price.range Aggregate.rating
## 0 0 0
## Rating.color Rating.text Votes
## 0 0 0
zomato_clean <- zomato[ , -c(1, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 19, 20)]
head(zomato_clean)unique(zomato_clean$Country.Code)## [1] 162 30 216 14 37 184 214 1 94 148 215 166 189 191 208
zomato_clean$Country[zomato$Country.Code == 162] <- "Philipine"
zomato_clean$Country[zomato$Country.Code == 30] <- "Brazil"
zomato_clean$Country[zomato$Country.Code == 216] <- "USA"
zomato_clean$Country[zomato$Country.Code == 14] <- "Australia"
zomato_clean$Country[zomato$Country.Code == 37] <- "Canada"
zomato_clean$Country[zomato$Country.Code == 184] <- "Singapore"
zomato_clean$Country[zomato$Country.Code == 214] <- "United Arab Emirates"
zomato_clean$Country[zomato$Country.Code == 1] <- "India"
zomato_clean$Country[zomato$Country.Code == 94] <- "Indonesia"
zomato_clean$Country[zomato$Country.Code == 148] <- "New Zealand"
zomato_clean$Country[zomato$Country.Code == 215] <- "England"
zomato_clean$Country[zomato$Country.Code == 166] <- "Qatar"
zomato_clean$Country[zomato$Country.Code == 189] <- "South Africa"
zomato_clean$Country[zomato$Country.Code == 191] <- "Sri Lanka"
zomato_clean$Country[zomato$Country.Code == 208] <- "Turkey"head(zomato_clean)zomato_clean$Country.Code <- as.factor(zomato_clean$Country.Code)
zomato_clean$Price.range <- as.factor(zomato_clean$Price.range)
zomato_clean$Country <- as.factor(zomato_clean$Country)
zomato_clean$City <- as.factor(zomato_clean$City)
zomato_clean$Currency <- as.factor(zomato_clean$Currency)summary(zomato_clean, maxsum = 20)## Restaurant.Name Country.Code City Average.Cost.for.two
## Length:9551 1 :8652 New Delhi :5473 Min. : 0
## Class :character 14 : 24 Gurgaon :1118 1st Qu.: 250
## Mode :character 30 : 60 Noida :1080 Median : 400
## 37 : 4 Faridabad : 251 Mean : 1199
## 94 : 21 Ghaziabad : 25 3rd Qu.: 700
## 148: 40 Ahmedabad : 21 Max. :800000
## 162: 22 Amritsar : 21
## 166: 20 Bhubaneshwar: 21
## 184: 20 Guwahati : 21
## 189: 60 Lucknow : 21
## 191: 20 Abu Dhabi : 20
## 208: 34 Agra : 20
## 214: 60 Albany : 20
## 215: 80 Allahabad : 20
## 216: 434 Ankara : 20
## Athens : 20
## Auckland : 20
## Augusta : 20
## Aurangabad : 20
## (Other) :1319
## Currency Price.range Aggregate.rating Votes
## Botswana Pula(P) : 22 1:4444 Min. :0.000 Min. : 0.0
## Brazilian Real(R$) : 60 2:3113 1st Qu.:2.500 1st Qu.: 5.0
## Dollar($) : 482 3:1408 Median :3.200 Median : 31.0
## Emirati Diram(AED) : 60 4: 586 Mean :2.666 Mean : 156.9
## Indian Rupees(Rs.) :8652 3rd Qu.:3.700 3rd Qu.: 131.0
## Indonesian Rupiah(IDR): 21 Max. :4.900 Max. :10934.0
## NewZealand($) : 40
## Pounds(Σ) : 80
## Qatari Rial(QR) : 20
## Rand(R) : 60
## Sri Lankan Rupee(LKR) : 20
## Turkish Lira(TL) : 34
##
##
##
##
##
##
##
##
## Country
## Australia : 24
## Brazil : 60
## Canada : 4
## England : 80
## India :8652
## Indonesia : 21
## New Zealand : 40
## Philipine : 22
## Qatar : 20
## Singapore : 20
## South Africa : 60
## Sri Lanka : 20
## Turkey : 34
## United Arab Emirates: 60
## USA : 434
##
##
##
##
##
From summary above, we may conclude some of the things 1. This dataset have 9551 restaurants located in 15 countries around the world. 2. Average cost for two is aroud 0 - 800000, with average value 1199. But the currency is different for each country. 3. New Delhi is the most highest compare to other cities. 4. Range of restaurant rating in zomato is around 0 - 4.9. 5. Range of restaurant votes in zomato is around 0 - 10934.
indonesia <- subset(x = zomato_clean,
Country == "Indonesia")
head(indonesia)range(indonesia$Average.Cost.for.two)## [1] 70000 800000
ggplot(indonesia, aes(City, Average.Cost.for.two)) +
geom_boxplot(aes(fill = City)) +
labs(title = "Cost by City",
x = "City",
y = "Cost",
fill = "City",
subtitle = "Red line indicate average cost") +
theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
scale_y_continuous(limits = c(0, 800000),
breaks = seq(from = 0, to = 800000, by = 200000)) +
geom_hline(yintercept = mean(indonesia$Average.Cost.for.two), color = "red", linetype = 5) Interpretations : a. as we can see from boxplot above, the highest cost of restaurant is in Jakarta. b. Second place is Tangerang, the third is Bogor and the lowest is Bandung. c. Average cost line only crossed the restaurant in Jakarta, more than half cost of restaurant in Jakarta are above the average price.
zomato_agg <- aggregate(Votes ~ Restaurant.Name + Country, data = zomato_clean, FUN = mean)
zomato_urut <- zomato_agg[order(-zomato_agg$Votes), ]
top_15 <- head(zomato_urut, n = 15)
top_15ggplot(data = top_15,
mapping = aes(y = reorder(Restaurant.Name, Votes),
x = Votes, fill = Votes)) +
geom_col() +
geom_text(aes(label = Country),
size = 3,
col = "white",
fontface = "bold",
hjust = -0.1) +
labs(title = "Top 15 Restaurant based on Votes in Zomato",
x = "Votes",
y = NULL,
fill = "Votes") +
scale_fill_gradient(low = "yellow", high = "red") +
scale_x_continuous(limits = c(0, 12000),
breaks = seq(from = 0, to = 12000, by = 3000)) +
theme(plot.background = element_rect(fill = "black"),
plot.title = element_text(colour = "white"),
axis.text = element_text(colour = "white"),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
legend.background = element_rect(fill = "black"),
legend.text = element_text(colour = "white"),
legend.title = element_text(colour = "white"))Interpretations : a. From the plot we can see, Toit is Restaurant with most Votes on Zomato. b. Second place is Hauz Khas Social and the third is Peter Cat. c. Top 15 restaurant based on Votes in Zomato, all of them is from India.
india <- subset(x = zomato_clean,
Country == "India")
head(india)ggplot(zomato_clean, aes(Average.Cost.for.two, Votes)) +
geom_jitter(aes(col = Votes))+
scale_x_log10()+
facet_wrap(~Country, scales = "free")+
labs(title = "Cost vs Votes", x="Cost", y= "Votes" )+
theme(plot.title = element_text(hjust = 0.5))## Warning: Transformation introduced infinite values in continuous x-axis
Interpretations : a. Every country have different correlation between cost and votes. b. We will interpret 2 country with most amount of Restaurant and also Indonesia. c. In India, mid-cost restaurants get the most votes from customers. d. In the USA, although restaurants are expensive but still get a considerable number of votes from customers. e. In Indonesia, mid-cost restaurants get the most votes from customers. But, expensive and low-cost restaurants also get votes that don’t differ much from the middle class.
From all graphs above, we may say some assumptions, such as : 1. Jakarta is the city with the highest number of restaurants in Indonesia listed on zomato with a very varied price range. The price of half the restaurant population in Jakarta has an above average price in Indonesia. 2. Because the data on zomato dataset is mostly restaurants originating from India, it is not surprising that the 15 best restaurants based on votes from customers come from India. The best restaurant based on votes in Zomato is Toit. 3. Scatter plots generated for each country vary according to the amount of data they have. Because there are countries with a lot of data and there are also with little data, causing there to be plots with good results and not good. Scatter plots chosen for interpreting are India and usa because it has the most amount of data, and also Indonesia which is our country. From the result of the plot that the restaurant that gets a lot of votes is a restaurant from the middle class in terms of cost.