Analytics Test

You work for a startup Asian food ordering app called nood.ly with customers predominantly in Australia. Customers download the app from the iOS App Store or Android’s Google Play. Customers can sign up with their email address, phone number, Google or Facebook. Customers can place an order for various dishes from a restaurant. The restaurant recieves the order, confirms it, cooks it and delivers it themselves. nood.ly takes a 20% commission from the restaurant for each order taken through the platform. Customers may rate the restaurant after the order on a scale of 1-5. 0 indicates that the user did not submit a rating.

1. What is the median rating for restaurants in Queensland?

# Import data
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
restaurant = read_csv('restaurant.csv')
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   name = col_character(),
##   address = col_character(),
##   type = col_character(),
##   state = col_character(),
##   verified_on = col_character(),
##   open_date = col_character(),
##   website_url = col_character(),
##   `phone_number
## ` = col_character(),
##   min_delivery_value = col_character()
## )
## See spec(...) for full column specifications.
head(restaurant)
## # A tibble: 6 x 20
##      id name  address rating type  delivers pick_ups state verified verified_on
##   <dbl> <chr> <chr>    <dbl> <chr>    <dbl>    <dbl> <chr>    <dbl> <chr>      
## 1     1 Chin… 67 Hay…      3 Chin…        1        1 New …        1 6/06/2019  
## 2     2 Pana… 654 Ge…      2 Mala…        1        1 New …        1 6/06/2019  
## 3     3 Hong… 14 For…      5 Chin…        1        1 New …        1 6/06/2019  
## 4     4 Fort… 18 Rai…      4 Chin…        1        1 New …        1 6/06/2019  
## 5     5 Dump… 14 Qua…      5 Chin…        1        1 New …        1 6/06/2019  
## 6     6 Drun… 68 Red…      3 Kore…        1        1 New …        1 6/06/2019  
## # … with 10 more variables: closed <dbl>, open_date <chr>, owner_id <dbl>,
## #   abn <dbl>, is_gst_registered <dbl>, website_url <chr>,
## #   `phone_number\n` <chr>, google_maps_id <dbl>, min_delivery_value <chr>,
## #   is_boosted <dbl>
# Find the median rating
queenland = restaurant %>%
                filter(state == 'Queensland' & rating != 0)
median(queenland$rating)
## [1] 3.5

2. nood.ly has a feature which elevates/boosts a restaurant to the top of the search results for a flat fee of $100 a week. If today is 2020-01-01, and all the boosted restaurants were boosted from the date they were verified on, how much “boost revenue” has nood.ly made to the nearest $1000?

restaurant$verified_on = as.Date(restaurant$verified_on, format = '%d/%m/%Y')
boosted = restaurant %>%
        filter(is_boosted == 1)
sum(as.numeric(as.Date('2020-01-01') - boosted$verified_on))*100/7
## [1] 26471.43

3. Suppose you define an outlier as being in excess of 2 standard deviations to the left or right of the mean value. Which of these restaurants are an outlier in terms of the number of items that they have on their menu?

# Import data
item = read_csv('item.csv')
## Parsed with column specification:
## cols(
##   id = col_double(),
##   restaurant_id = col_double(),
##   name = col_character(),
##   chill_rating = col_double(),
##   ingredents_array = col_character(),
##   category = col_character(),
##   currency_code = col_character(),
##   sizes = col_character(),
##   has_gst = col_double()
## )
head(item)
## # A tibble: 6 x 9
##      id restaurant_id name  chill_rating ingredents_array category currency_code
##   <dbl>         <dbl> <chr>        <dbl> <chr>            <chr>    <chr>        
## 1     1             1 "Sic…            0 34, 67, 99, 12,… "Main\n" AUD          
## 2     2             1 "Spr…            0 2, 66, 13, 12    "Side"   AUD          
## 3     3             1 "Pra…            0 12, 78, 45, 113  "Main\n" AUD          
## 4     4             1 "Sti…            1 12, 54, 88, 19   "Main\n" AUD          
## 5     5             1 "Spe…            1 1, 8, 16, 55     "Main\n" AUD          
## 6     6             1 "Bra…            0 12, 45, 31, 88,… "Main\n" AUD          
## # … with 2 more variables: sizes <chr>, has_gst <dbl>
# Compute bounds
num_items = item %>%
        group_by(restaurant_id) %>%
        summarise(count = n())
## `summarise()` ungrouping output (override with `.groups` argument)
mean_items = mean(num_items$count)
sd_items = sd(num_items$count)
cat("[", mean_items - 2*sd_items, ",", mean_items + 2*sd_items, "]")
## [ 3.143891 , 32.76787 ]
# Find the restaurant
outlier_id = num_items %>%
        filter(count < 4 | count > 32)
outlier_name = restaurant %>%
        filter(id %in% outlier_id$restaurant_id)
print(outlier_name$name)
## [1] "Panang Paradise"       "Fortune Village"       "Lemongrass Thailandia"

4. The distribution of Asian restaurant is as follows: Indonesian: 6%, Chinese: 25%, Thai: 27%, Japanese: 12%, Malaysian: 6%, Indian: 15%, Korean: 5%, Other: 4%. Relative to the overall market, which types of restaurants does nood.ly have relatively more than the overall market?

types = restaurant %>%
        group_by(type) %>%
        summarise(count = n())
## `summarise()` ungrouping output (override with `.groups` argument)
types %>%
        mutate(proportion = count/sum(types$count))
## # A tibble: 9 x 3
##   type         count proportion
##   <chr>        <int>      <dbl>
## 1 "Chinese"       30       0.3 
## 2 "Indian"         9       0.09
## 3 "Indonesian"    11       0.11
## 4 "Japanese"       7       0.07
## 5 "Korean"         5       0.05
## 6 "Malaysian"      3       0.03
## 7 "Thai"          22       0.22
## 8 "Thai\n"         1       0.01
## 9 "Vietnamese"    12       0.12

5. Your developers say there is only space for 20 characters. What proportion of Chinese restaurants’ names will not fit?

restaurant$name = as.character(restaurant$name)
china = restaurant %>%
        filter(type == 'Chinese')
china_long = china %>%
                filter(nchar(name) > 20)
nrow(china_long)/nrow(china)
## [1] 0.06666667

6. Which Chinese restaurant have the lowest average ingredients per item?

library(stringr)
item$num_ingred = 0
for (i in 1:nrow(item)){
        item[i,]$num_ingred = length(str_split(item$ingredents_array, ",")[[i]])
}

item_china = item %>%
                filter(restaurant_id %in% china$id) %>%
                group_by(restaurant_id) %>%
                summarise(average = sum(num_ingred)/n()) %>%
                top_n(-2, average)
## `summarise()` ungrouping output (override with `.groups` argument)
china_name = restaurant %>%
                filter(id %in% item_china$restaurant_id)
print(china_name$name)
## [1] "Sichuan Sydney" "Yummy Yum Cha"

7. What is the most popular dish to have on a menu? Is there any issue with determining this?

item %>%
        group_by(name) %>%
        summarise(count = n()) %>%
        top_n(1, count)
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 1 x 2
##   name         count
##   <chr>        <int>
## 1 Spring Rolls    15

8. Of the restaurants that have been verified and haven’t closed, how many have not set up their menu items yet in the nood.ly system?

restaurant_v_o = restaurant %>%
                        filter(verified == 1 & closed == 0)
summary(restaurant_v_o$id %in% item$restaurant_id)
##    Mode   FALSE    TRUE 
## logical      29      56