RestaurantGrades

RestaurantGrades (RG) is a restaurant review platform(like Yelp) which allows users to review various websites as well as view and order from the restaurants. Its main source of revenue are restaurants who pay to advertise on their platform.

RG’s current search algorithm shows ads for restaurants triggered by type of cuisine within a 0.5-mile radius of users search. However, a new search algorithm has been developed that shows ads when a user searches for a specific restaurant and selects two restaurants with similar ratings and hours.

RG randomly selects 30,000 restaurants and divides them into three groups of randomly selected 10,000 restaurants each:- 1. Control (no ads) 2. Treatment 1 (ads of current design) 3. Treatment 2 (ads of alternative design)

restaurantgrades <- read.csv(file = "Restaurant Grades.csv", header = TRUE)
restaurantgrades$treatment <- as.factor(restaurantgrades$treatment)
str(restaurantgrades)

## 'data.frame':    30000 obs. of  6 variables:
##  $ treatment      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ pageviews      : int  643 621 581 592 648 519 583 659 507 577 ...
##  $ calls          : int  44 41 40 35 45 37 47 37 40 41 ...
##  $ reservations   : int  39 44 38 31 46 41 42 42 30 35 ...
##  $ business_id    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ restaurant_type: Factor w/ 2 levels "chain","independent": 1 1 1 1 1 1 1 1 1 1 ...

Let us take our target variable to be reservations as it is the best representation of the advertising objective(sales) of the restaurants. We shall perform a regression and see which treatment group has the most positive impact on reservations. The treatment variable is categorical in nature with 3 categories. Thus it needs to be broken down into 2 variables for regression which is taken care of for us by R.

model1 <- lm(reservations ~ treatment, data = restaurantgrades)
model1

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades)
## 
## Coefficients:
## (Intercept)   treatment1   treatment2  
##     33.9604       0.0608       7.7201

summary(model1)

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.680  -5.021  -1.021   4.320  37.320 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.96040    0.07123 476.762   <2e-16 ***
## treatment1   0.06080    0.10074   0.604    0.546    
## treatment2   7.72010    0.10074  76.637   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.123 on 29997 degrees of freedom
## Multiple R-squared:  0.2057, Adjusted R-squared:  0.2057 
## F-statistic:  3885 on 2 and 29997 DF,  p-value: < 2.2e-16

Thus, we can see that treatment2 (alternate ad design) with a much higher coefficient seems to be having a better impact on sales as compared to treament1

Let us see if the same holds true for both the restaurant types - chain and independent

require("dplyr")

## Loading required package: dplyr

## Warning: package 'dplyr' was built under R version 3.3.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

restaurantgrades_chain = filter(.data = restaurantgrades, restaurant_type == "chain")
model_chain = lm(reservations ~ treatment, data = restaurantgrades_chain)
model_chain

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_chain)
## 
## Coefficients:
## (Intercept)   treatment1   treatment2  
##      39.925        0.176        8.077

summary(model_chain)

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_chain)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.002  -4.002  -0.002   3.998  30.998 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  39.9250     0.1019 391.862   <2e-16 ***
## treatment1    0.1760     0.1441   1.221    0.222    
## treatment2    8.0770     0.1441  56.056   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.444 on 11997 degrees of freedom
## Multiple R-squared:  0.2547, Adjusted R-squared:  0.2546 
## F-statistic:  2050 on 2 and 11997 DF,  p-value: < 2.2e-16

It holds true for chain type restaurants

restaurantgrades_independent = filter(.data = restaurantgrades, restaurant_type == "independent")
model_independent = lm(reservations ~ treatment, data = restaurantgrades_independent)
model_independent

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_independent)
## 
## Coefficients:
## (Intercept)   treatment1   treatment2  
##      29.984       -0.016        7.482

summary(model_independent)

## 
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_independent)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.984  -2.466   0.032   2.534  18.016 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.98400    0.05036 595.388   <2e-16 ***
## treatment1  -0.01600    0.07122  -0.225    0.822    
## treatment2   7.48217    0.07122 105.056   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.901 on 17997 degrees of freedom
## Multiple R-squared:  0.4504, Adjusted R-squared:  0.4503 
## F-statistic:  7374 on 2 and 17997 DF,  p-value: < 2.2e-16

Thus, it is suggested that RestaurantGrades change their ad mechanism so that restaurants who paid for their ads to be shown could increase their sales

RestaurantGrades

Ajinkya Deore

March 14, 2017