RestaurantGrades (RG) is a restaurant review platform(like Yelp) which allows users to review various websites as well as view and order from the restaurants. Its main source of revenue are restaurants who pay to advertise on their platform.
RG’s current search algorithm shows ads for restaurants triggered by type of cuisine within a 0.5-mile radius of users search. However, a new search algorithm has been developed that shows ads when a user searches for a specific restaurant and selects two restaurants with similar ratings and hours.
RG randomly selects 30,000 restaurants and divides them into three groups of randomly selected 10,000 restaurants each:- 1. Control (no ads) 2. Treatment 1 (ads of current design) 3. Treatment 2 (ads of alternative design)
restaurantgrades <- read.csv(file = "Restaurant Grades.csv", header = TRUE)
restaurantgrades$treatment <- as.factor(restaurantgrades$treatment)
str(restaurantgrades)
## 'data.frame': 30000 obs. of 6 variables:
## $ treatment : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ pageviews : int 643 621 581 592 648 519 583 659 507 577 ...
## $ calls : int 44 41 40 35 45 37 47 37 40 41 ...
## $ reservations : int 39 44 38 31 46 41 42 42 30 35 ...
## $ business_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ restaurant_type: Factor w/ 2 levels "chain","independent": 1 1 1 1 1 1 1 1 1 1 ...
Let us take our target variable to be reservations as it is the best representation of the advertising objective(sales) of the restaurants. We shall perform a regression and see which treatment group has the most positive impact on reservations. The treatment variable is categorical in nature with 3 categories. Thus it needs to be broken down into 2 variables for regression which is taken care of for us by R.
model1 <- lm(reservations ~ treatment, data = restaurantgrades)
model1
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades)
##
## Coefficients:
## (Intercept) treatment1 treatment2
## 33.9604 0.0608 7.7201
summary(model1)
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.680 -5.021 -1.021 4.320 37.320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.96040 0.07123 476.762 <2e-16 ***
## treatment1 0.06080 0.10074 0.604 0.546
## treatment2 7.72010 0.10074 76.637 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.123 on 29997 degrees of freedom
## Multiple R-squared: 0.2057, Adjusted R-squared: 0.2057
## F-statistic: 3885 on 2 and 29997 DF, p-value: < 2.2e-16
Thus, we can see that treatment2 (alternate ad design) with a much higher coefficient seems to be having a better impact on sales as compared to treament1
Let us see if the same holds true for both the restaurant types - chain and independent
require("dplyr")
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
restaurantgrades_chain = filter(.data = restaurantgrades, restaurant_type == "chain")
model_chain = lm(reservations ~ treatment, data = restaurantgrades_chain)
model_chain
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_chain)
##
## Coefficients:
## (Intercept) treatment1 treatment2
## 39.925 0.176 8.077
summary(model_chain)
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_chain)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.002 -4.002 -0.002 3.998 30.998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.9250 0.1019 391.862 <2e-16 ***
## treatment1 0.1760 0.1441 1.221 0.222
## treatment2 8.0770 0.1441 56.056 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.444 on 11997 degrees of freedom
## Multiple R-squared: 0.2547, Adjusted R-squared: 0.2546
## F-statistic: 2050 on 2 and 11997 DF, p-value: < 2.2e-16
It holds true for chain type restaurants
restaurantgrades_independent = filter(.data = restaurantgrades, restaurant_type == "independent")
model_independent = lm(reservations ~ treatment, data = restaurantgrades_independent)
model_independent
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_independent)
##
## Coefficients:
## (Intercept) treatment1 treatment2
## 29.984 -0.016 7.482
summary(model_independent)
##
## Call:
## lm(formula = reservations ~ treatment, data = restaurantgrades_independent)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.984 -2.466 0.032 2.534 18.016
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.98400 0.05036 595.388 <2e-16 ***
## treatment1 -0.01600 0.07122 -0.225 0.822
## treatment2 7.48217 0.07122 105.056 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.901 on 17997 degrees of freedom
## Multiple R-squared: 0.4504, Adjusted R-squared: 0.4503
## F-statistic: 7374 on 2 and 17997 DF, p-value: < 2.2e-16
Thus, it is suggested that RestaurantGrades change their ad mechanism so that restaurants who paid for their ads to be shown could increase their sales