Intro

What is Zomato?

      Zomato is an Indian restaurant aggregator and food delivery start-up founded by Deepinder Goyal and Pankaj Chaddah in 2008. Zomato provides information, menus and user-reviews of restaurants as well as food delivery options from partner restaurants in select cities. Zomato also began grocery delivery amid the COVID-19 outbreak. As of 2019, the service is available in 24 countries and in more than 10,000 cities.

Library

library(ggplot2)
library(ggthemes)
library(scales)
library(tidyr)
library(ggrepel)

Data Explanatory

Data Input and Structure

zomato <- read.csv("data_input/zomato.csv")
dim(zomato)
## [1] 9551   21
head(zomato)
tail(zomato)
str(zomato)
## 'data.frame':    9551 obs. of  21 variables:
##  $ Restaurant.ID       : int  6317637 6304287 6300002 6318506 6314302 18189371 6300781 6301290 6300010 6314987 ...
##  $ Restaurant.Name     : chr  "Le Petit Souffle" "Izakaya Kikufuji" "Heat - Edsa Shangri-La" "Ooma" ...
##  $ Country.Code        : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ City                : chr  "Makati City" "Makati City" "Mandaluyong City" "Mandaluyong City" ...
##  $ Address             : chr  "Third Floor, Century City Mall, Kalayaan Avenue, Poblacion, Makati City" "Little Tokyo, 2277 Chino Roces Avenue, Legaspi Village, Makati City" "Edsa Shangri-La, 1 Garden Way, Ortigas, Mandaluyong City" "Third Floor, Mega Fashion Hall, SM Megamall, Ortigas, Mandaluyong City" ...
##  $ Locality            : chr  "Century City Mall, Poblacion, Makati City" "Little Tokyo, Legaspi Village, Makati City" "Edsa Shangri-La, Ortigas, Mandaluyong City" "SM Megamall, Ortigas, Mandaluyong City" ...
##  $ Locality.Verbose    : chr  "Century City Mall, Poblacion, Makati City, Makati City" "Little Tokyo, Legaspi Village, Makati City, Makati City" "Edsa Shangri-La, Ortigas, Mandaluyong City, Mandaluyong City" "SM Megamall, Ortigas, Mandaluyong City, Mandaluyong City" ...
##  $ Longitude           : num  121 121 121 121 121 ...
##  $ Latitude            : num  14.6 14.6 14.6 14.6 14.6 ...
##  $ Cuisines            : chr  "French, Japanese, Desserts" "Japanese" "Seafood, Asian, Filipino, Indian" "Japanese, Sushi" ...
##  $ Average.Cost.for.two: int  1100 1200 4000 1500 1500 1000 2000 2000 6000 1100 ...
##  $ Currency            : chr  "Botswana Pula(P)" "Botswana Pula(P)" "Botswana Pula(P)" "Botswana Pula(P)" ...
##  $ Has.Table.booking   : chr  "Yes" "Yes" "Yes" "No" ...
##  $ Has.Online.delivery : chr  "No" "No" "No" "No" ...
##  $ Is.delivering.now   : chr  "No" "No" "No" "No" ...
##  $ Switch.to.order.menu: chr  "No" "No" "No" "No" ...
##  $ Price.range         : int  3 3 4 4 4 3 4 4 4 3 ...
##  $ Aggregate.rating    : num  4.8 4.5 4.4 4.9 4.8 4.4 4 4.2 4.9 4.8 ...
##  $ Rating.color        : chr  "Dark Green" "Dark Green" "Green" "Dark Green" ...
##  $ Rating.text         : chr  "Excellent" "Excellent" "Very Good" "Excellent" ...
##  $ Votes               : int  314 591 270 365 229 336 520 677 621 532 ...

Missing Value

anyNA(zomato)
## [1] FALSE
colSums(is.na(zomato))
##        Restaurant.ID      Restaurant.Name         Country.Code 
##                    0                    0                    0 
##                 City              Address             Locality 
##                    0                    0                    0 
##     Locality.Verbose            Longitude             Latitude 
##                    0                    0                    0 
##             Cuisines Average.Cost.for.two             Currency 
##                    0                    0                    0 
##    Has.Table.booking  Has.Online.delivery    Is.delivering.now 
##                    0                    0                    0 
## Switch.to.order.menu          Price.range     Aggregate.rating 
##                    0                    0                    0 
##         Rating.color          Rating.text                Votes 
##                    0                    0                    0

Subsetting and Change Data Type

zomato_clean <- zomato[ , -c(1, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 19, 20)]
head(zomato_clean)
unique(zomato_clean$Country.Code)
##  [1] 162  30 216  14  37 184 214   1  94 148 215 166 189 191 208
zomato_clean$Country[zomato$Country.Code == 162] <- "Philipine"
zomato_clean$Country[zomato$Country.Code == 30] <- "Brazil"
zomato_clean$Country[zomato$Country.Code == 216] <- "USA"
zomato_clean$Country[zomato$Country.Code == 14] <- "Australia"
zomato_clean$Country[zomato$Country.Code == 37] <- "Canada"
zomato_clean$Country[zomato$Country.Code == 184] <- "Singapore"
zomato_clean$Country[zomato$Country.Code == 214] <- "United Arab Emirates"
zomato_clean$Country[zomato$Country.Code == 1] <- "India"
zomato_clean$Country[zomato$Country.Code == 94] <- "Indonesia"
zomato_clean$Country[zomato$Country.Code == 148] <- "New Zealand"
zomato_clean$Country[zomato$Country.Code == 215] <- "England"
zomato_clean$Country[zomato$Country.Code == 166] <- "Qatar"
zomato_clean$Country[zomato$Country.Code == 189] <- "South Africa"
zomato_clean$Country[zomato$Country.Code == 191] <- "Sri Lanka"
zomato_clean$Country[zomato$Country.Code == 208] <- "Turkey"
head(zomato_clean)
zomato_clean$Country.Code <- as.factor(zomato_clean$Country.Code)
zomato_clean$Price.range <- as.factor(zomato_clean$Price.range)
zomato_clean$Country <- as.factor(zomato_clean$Country)
zomato_clean$City <- as.factor(zomato_clean$City)
zomato_clean$Currency <- as.factor(zomato_clean$Currency)
summary(zomato_clean, maxsum = 20)
##  Restaurant.Name    Country.Code           City      Average.Cost.for.two
##  Length:9551        1  :8652     New Delhi   :5473   Min.   :     0      
##  Class :character   14 :  24     Gurgaon     :1118   1st Qu.:   250      
##  Mode  :character   30 :  60     Noida       :1080   Median :   400      
##                     37 :   4     Faridabad   : 251   Mean   :  1199      
##                     94 :  21     Ghaziabad   :  25   3rd Qu.:   700      
##                     148:  40     Ahmedabad   :  21   Max.   :800000      
##                     162:  22     Amritsar    :  21                       
##                     166:  20     Bhubaneshwar:  21                       
##                     184:  20     Guwahati    :  21                       
##                     189:  60     Lucknow     :  21                       
##                     191:  20     Abu Dhabi   :  20                       
##                     208:  34     Agra        :  20                       
##                     214:  60     Albany      :  20                       
##                     215:  80     Allahabad   :  20                       
##                     216: 434     Ankara      :  20                       
##                                  Athens      :  20                       
##                                  Auckland    :  20                       
##                                  Augusta     :  20                       
##                                  Aurangabad  :  20                       
##                                  (Other)     :1319                       
##                    Currency    Price.range Aggregate.rating     Votes        
##  Botswana Pula(P)      :  22   1:4444      Min.   :0.000    Min.   :    0.0  
##  Brazilian Real(R$)    :  60   2:3113      1st Qu.:2.500    1st Qu.:    5.0  
##  Dollar($)             : 482   3:1408      Median :3.200    Median :   31.0  
##  Emirati Diram(AED)    :  60   4: 586      Mean   :2.666    Mean   :  156.9  
##  Indian Rupees(Rs.)    :8652               3rd Qu.:3.700    3rd Qu.:  131.0  
##  Indonesian Rupiah(IDR):  21               Max.   :4.900    Max.   :10934.0  
##  NewZealand($)         :  40                                                 
##  Pounds(Σ)            :  80                                                 
##  Qatari Rial(QR)       :  20                                                 
##  Rand(R)               :  60                                                 
##  Sri Lankan Rupee(LKR) :  20                                                 
##  Turkish Lira(TL)      :  34                                                 
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##                  Country    
##  Australia           :  24  
##  Brazil              :  60  
##  Canada              :   4  
##  England             :  80  
##  India               :8652  
##  Indonesia           :  21  
##  New Zealand         :  40  
##  Philipine           :  22  
##  Qatar               :  20  
##  Singapore           :  20  
##  South Africa        :  60  
##  Sri Lanka           :  20  
##  Turkey              :  34  
##  United Arab Emirates:  60  
##  USA                 : 434  
##                             
##                             
##                             
##                             
## 

From summary above, we may conclude some of the things 1. This dataset have 9551 restaurants located in 15 countries around the world. 2. Average cost for two is aroud 0 - 800000, with average value 1199. But the currency is different for each country. 3. New Delhi is the most highest compare to other cities. 4. Range of restaurant rating in zomato is around 0 - 4.9. 5. Range of restaurant votes in zomato is around 0 - 10934.

Study Case

We will check the interaction between cost and city overlay with average cost in Indonesia.

indonesia <- subset(x = zomato_clean,
                    Country == "Indonesia")
head(indonesia)
range(indonesia$Average.Cost.for.two)
## [1]  70000 800000
ggplot(indonesia, aes(City, Average.Cost.for.two)) +
  geom_boxplot(aes(fill = City)) +
  labs(title = "Cost by City", 
       x = "City", 
       y = "Cost", 
       fill = "City",
       subtitle = "Red line indicate average cost") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
  scale_y_continuous(limits = c(0, 800000),
                     breaks = seq(from = 0, to = 800000, by = 200000)) +
  geom_hline(yintercept = mean(indonesia$Average.Cost.for.two), color = "red", linetype = 5)

Interpretations : a. as we can see from boxplot above, the highest cost of restaurant is in Jakarta. b. Second place is Tangerang, the third is Bogor and the lowest is Bandung. c. Average cost line only crossed the restaurant in Jakarta, more than half cost of restaurant in Jakarta are above the average price.

Top 15 Restaurant based on Votes in Zomato around the World

zomato_agg <- aggregate(Votes ~ Restaurant.Name + Country, data = zomato_clean, FUN = mean)

zomato_urut <- zomato_agg[order(-zomato_agg$Votes), ]

top_15 <- head(zomato_urut, n = 15)

top_15
ggplot(data = top_15,
       mapping = aes(y = reorder(Restaurant.Name, Votes),
                     x = Votes, fill = Votes)) +
  geom_col() +
  geom_text(aes(label = Country),
            size = 3, 
            col = "white", 
            fontface = "bold", 
            hjust = -0.1) +
  labs(title = "Top 15 Restaurant based on Votes in Zomato",
       x = "Votes",
       y = NULL,
       fill = "Votes") +
  scale_fill_gradient(low = "yellow", high = "red") +
  scale_x_continuous(limits = c(0, 12000), 
                     breaks = seq(from = 0, to = 12000, by = 3000)) +
  theme(plot.background = element_rect(fill = "black"),
        plot.title = element_text(colour = "white"),
        axis.text = element_text(colour = "white"),
        panel.background = element_rect(fill = "black"),
        panel.grid = element_blank(),
        legend.background = element_rect(fill = "black"),
        legend.text = element_text(colour = "white"),
        legend.title = element_text(colour = "white"))

Interpretations : a. From the plot we can see, Toit is Restaurant with most Votes on Zomato. b. Second place is Hauz Khas Social and the third is Peter Cat. c. Top 15 restaurant based on Votes in Zomato, all of them is from India.

Correlation between cost and votes

india <- subset(x = zomato_clean,
                    Country == "India")
head(india)
ggplot(zomato_clean, aes(Average.Cost.for.two, Votes)) +
   geom_jitter(aes(col = Votes))+
   scale_x_log10()+
   facet_wrap(~Country, scales = "free")+
   labs(title = "Cost vs Votes", x="Cost", y= "Votes" )+
   theme(plot.title = element_text(hjust = 0.5))
## Warning: Transformation introduced infinite values in continuous x-axis

Interpretations : a. Every country have different correlation between cost and votes. b. We will interpret 2 country with most amount of Restaurant and also Indonesia. c. In India, mid-cost restaurants get the most votes from customers. d. In the USA, although restaurants are expensive but still get a considerable number of votes from customers. e. In Indonesia, mid-cost restaurants get the most votes from customers. But, expensive and low-cost restaurants also get votes that don’t differ much from the middle class.

Conclusion

From all graphs above, we may say some assumptions, such as : 1. Jakarta is the city with the highest number of restaurants in Indonesia listed on zomato with a very varied price range. The price of half the restaurant population in Jakarta has an above average price in Indonesia. 2. Because the data on zomato dataset is mostly restaurants originating from India, it is not surprising that the 15 best restaurants based on votes from customers come from India. The best restaurant based on votes in Zomato is Toit. 3. Scatter plots generated for each country vary according to the amount of data they have. Because there are countries with a lot of data and there are also with little data, causing there to be plots with good results and not good. Scatter plots chosen for interpreting are India and usa because it has the most amount of data, and also Indonesia which is our country. From the result of the plot that the restaurant that gets a lot of votes is a restaurant from the middle class in terms of cost.