This notebook includes suggested solutions and rubrics to the questions in Assignment 2. For some questions, your answers might not be exactly the same as the suggested solutions. Please download and load data “Assignment_2.Rdata” from Canvas.
## [1] "app.installs" "estimate_bass" "predict_bass" "promo.trial"
Question 1
Question 1a (4.5 points)
To estimate the Bass parameters of the 15 apps, we use the function
provided in the Bass model session estimate_bass. Because
the data are structured in a list app.installs, we use
lapply to apply the function to the list.
# apply estimate_bass to the list app.installs
apps_bass_paras <- lapply(app.installs,estimate_bass)
# transform the list to a table for easy reading
temp <- t(matrix(unlist(apps_bass_paras),
nrow = 3,
ncol = 15
)
)
apps_bass_paras <- data.frame(apps = names(apps_bass_paras),
p = temp[,1],
q = temp[,2],
M = temp[,3],
stringsAsFactors=F)
apps_bass_paras## apps p q M
## 1 Dogbook 0.0012085370 0.009555393 426147556
## 2 Books 0.0011914796 0.007892291 26127121
## 3 PuzzleBee 0.0011913232 0.011301563 432231050
## 4 Astrology 0.0012175449 0.016335565 391132854
## 5 Languages 0.0012181564 0.017320782 3235610
## 6 BioRhythms 0.0047749840 0.013975515 11855916
## 7 TV Shows 0.0015368375 0.013910218 7208739
## 8 (fluff)Friends 0.0012821425 0.014008677 1110381495
## 9 The Greek Community 0.0023553027 0.013864790 37232112
## 10 Pop Culture Quizzes 0.0008782294 0.014046545 27159228
## 11 My Countdowns 0.0010571332 0.006894111 51303262
## 12 Yo Momma 0.0029939254 0.015094171 53674508
## 13 Fantasy Stock Exchange 0.0022325679 0.009898809 53947020
## 14 PersonalDNA 0.0043451070 0.017478213 54315815
## 15 Weather 0.0007818028 0.018393308 54432287
Grading of Question 1a
In total 45 values in the table, each value is 0.1 point. So, the total no. of points is \(0.1\times45=4.5\).
Question 1b (3 points)
Guidelines to the answers:
This is a conceptual question. The general approach is to compare the 15 apps in Table 2 to the Chess app, and which apps are similar to the Chess app on these parameters, and explain why you think they are similar. For example, you may compare the 15 apps with Chess app on the market size M. It seems Fantasy.Stock.Exchange is the most similar to the Chess app because 1) they are both gaming apps; and 2) they are both games in the genre of strategy games. These two game might be liked by similar groups of people.
Grading Question 1b
There are three parameters to consider fp; q;Mg. Each parameter should be discussed separately. The explanations of each parameter are 4 points, and in total, we have 1 point * 3 parameters = 3 points. For each parameter, the 4 points are allocated according to the general guidelines, as below:
- Clarity (20% or 0.2 point): Your explanations are clear and easy to understand.
- Relevance (30% or 0.3 point): You must compare the similarities between the 15 apps and the Chess app. The similarities should be those relevant for innovation parameter, imitation parameter and the market size.
- Validity (50% or 0.5 point): Your explanations about the similarities are reasonable and on good grounds.
Question 1c (2.5 points)
As different people may choose different values of \((p,q,M)\), the predicted cumulative no. of
adoptions naturally differ. The overall idea is the mean, median,
standard deviation must be reported, as well as the line plot of \(T\) vs. \(Y\). Note that \(T\) must be from 181 to 360. Purely as an
example, I will set \(p = 0.001\),
\(q = 0.01\) and \(M = 50,000,000\). This is ONLY an example!
We set T equal to 181:360 in total 180 days.
You can copy paste the codes from the R notebook in the Bass model
session for the prediction of cumulative installations.
# setting p, q and M
p <- 0.001
q <- 0.01
M <- 50000000
T <- 181:360
N <- M*(1-(p+q)/(p*exp((p+q)*T)+q))
summary(N)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18250034 24977394 31418495 30816673 36924787 41193987
We can also plot the cumulative no. of installations with a line plot:
Grading Question 1c
For grading, it does not matter what no. of \((p,q,M)\) are chosen here. As long as the no. of cumulative installations are reported, it is fine. The breakdown of points are as below:
| Aspects | Points |
|---|---|
| Mean | 0.5 point |
| Median | 0.5 point |
| Std.Dev. | 0.25 point |
| Plot | 1.25 point |
Question 2
This questions uses the promo.trial as the data.
Question 2a (4.5 points)
For this question, we only need to run three regressions with the treatment variable using different outcomes. As noted in the question, we need to transform the variables in the estimations.
The ATE on logins:
##
## Call:
## lm(formula = log(logins) ~ treatment, data = promo.trial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9265 -0.2784 -0.1499 0.3728 0.7049
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.98306 0.01089 273.88 <2e-16 ***
## treatmentcharity 0.18568 0.01218 15.25 <2e-16 ***
## treatmentdiscount 0.14068 0.01218 11.55 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3444 on 8997 degrees of freedom
## Multiple R-squared: 0.02531, Adjusted R-squared: 0.0251
## F-statistic: 116.8 on 2 and 8997 DF, p-value: < 2.2e-16
The ATE on bookings:
##
## Call:
## lm(formula = log(bookings) ~ treatment, data = promo.trial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72247 -0.10138 -0.02932 0.18630 0.66382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2005714 0.0065711 182.706 <2e-16 ***
## treatmentcharity -0.0005797 0.0073467 -0.079 0.937
## treatmentdiscount 0.2150454 0.0073467 29.271 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2078 on 8997 degrees of freedom
## Multiple R-squared: 0.2099, Adjusted R-squared: 0.2097
## F-statistic: 1195 on 2 and 8997 DF, p-value: < 2.2e-16
The ATE on booking rates:
# create a new variable of booking rates
promo.trial$BR <- promo.trial$bookings/promo.trial$logins
# estimate the ATE on booking rates
mdl_br <- lm(log(BR/(1-BR)) ~ treatment, data = promo.trial)
summary(mdl_br)##
## Call:
## lm(formula = log(BR/(1 - BR)) ~ treatment, data = promo.trial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.30591 -0.23394 0.02199 0.24469 1.27431
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.589812 0.009686 -164.130 <2e-16 ***
## treatmentcharity -0.223939 0.010830 -20.678 <2e-16 ***
## treatmentdiscount 0.092363 0.010830 8.529 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3063 on 8997 degrees of freedom
## Multiple R-squared: 0.1946, Adjusted R-squared: 0.1944
## F-statistic: 1087 on 2 and 8997 DF, p-value: < 2.2e-16
Grading Question 2a
This question is graded based on the estimates and standard errors of the treatment effects (
treamentcharityandtreatmentdiscount) in the three models. In total, there are 6 coefficients in the 3 models, as well as 6 standard errors. Each value of the coefficients (and each value of standard errors) is worth 0.375 point. So, in total we have \(12\times 0.375 = 4.5\) points.
Question 2b (3 points)
Notes for this question:
- There is not “correct” answer for this question. You may argue either ad is more effective.
- It is not necessary that you explanations are based only on estimation results; the marketing situations and related marketing insights may also be used in your explanations.
- The question is graded based on the general criteria (see below).
Grading Question 2b
As this is a conceptual question, I discuss the grading criteria as below. Note that there is no “correct” answer as to which advert is more effective. Choosing either ad is fine. The focus is more on the argumentation.
Clarity (15% or 0.45 point): the explanations are clearly presented and easy to understand.
Completeness (20% or 0.6 point): different pros and cons are considered in the explanations. For example, the current ad is more effecimptive in generating traffic (logins), but the alternative ads have higher conversions (bookings).
Justifiability (30% or 0.9 point): the patterns summarized are consistent with the estimation results, and the explanations of the patterns are well-founded on the marketing contexts of online review platforms and marketing theories/insights about online reviews. The marketing theories or insights should we well-referenced; if not, no points.
Validity (35% or 1.05 points): the explanations are valid in the sense that they are not self-contradictory and contradictory to common sense and/or marketing insights.
Question 2c (4.5 points)
For this question, we run regressions which interact the types of ads
(customer_type) and the treatment variable
(treatment).
The heterogeneous treatment effects on logins:
##
## Call:
## lm(formula = log(logins) ~ treatment * customer_type, data = promo.trial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.64086 -0.06549 0.00955 0.07219 0.34602
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.698499 0.004583 588.80 <2e-16
## treatmentcharity 0.225045 0.005117 43.98 <2e-16
## treatmentdiscount 0.139584 0.005124 27.24 <2e-16
## customer_typehigh value 0.699170 0.007184 97.33 <2e-16
## treatmentcharity:customer_typehigh value -0.084630 0.008037 -10.53 <2e-16
## treatmentdiscount:customer_typehigh value 0.002249 0.008032 0.28 0.779
##
## (Intercept) ***
## treatmentcharity ***
## treatmentdiscount ***
## customer_typehigh value ***
## treatmentcharity:customer_typehigh value ***
## treatmentdiscount:customer_typehigh value
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1116 on 8994 degrees of freedom
## Multiple R-squared: 0.8977, Adjusted R-squared: 0.8976
## F-statistic: 1.578e+04 on 5 and 8994 DF, p-value: < 2.2e-16
Given the estimation results, the treatment effects under the 4 conditions (Charity vs. Discount and Low vs. High Value) are as below. Note that you must set the insignificant coefficients to zeros.
## Charity Discount
## Low Value 0.2250448 0.1395843
## High Value 0.1404153 0.1395843
The heterogeneous treatment effects on bookings:
hetero_bookings <- lm(log(bookings)~treatment*customer_type,data = promo.trial)
summary(hetero_bookings)##
## Call:
## lm(formula = log(bookings) ~ treatment * customer_type, data = promo.trial)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.85937 0.00611 0.01825 0.06474 0.52692
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.0798117 0.0065598 164.611
## treatmentcharity 0.0005516 0.0073243 0.075
## treatmentdiscount 0.2417449 0.0073344 32.961
## customer_typehigh value 0.2967067 0.0102823 28.856
## treatmentcharity:customer_typehigh value 0.0031136 0.0115034 0.271
## treatmentdiscount:customer_typehigh value -0.0657425 0.0114958 -5.719
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## treatmentcharity 0.940
## treatmentdiscount < 2e-16 ***
## customer_typehigh value < 2e-16 ***
## treatmentcharity:customer_typehigh value 0.787
## treatmentdiscount:customer_typehigh value 1.11e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1597 on 8994 degrees of freedom
## Multiple R-squared: 0.5332, Adjusted R-squared: 0.533
## F-statistic: 2055 on 5 and 8994 DF, p-value: < 2.2e-16
Given the estimation results, the treatment effects under the 4 conditions (Charity vs. Discount and Low vs. High Value) are as below. Note that you must set the insignificant coefficients to zeros.
## Charity Discount
## Low Value 0 0.2417449
## High Value 0 0.1760023
Grading Question 1c 1)
For this question, the reported treatment effects table must be similar to the suggested solutions. In total, we have 8 values, with each value 0.375 point. So, we have \(8\times 0.375 = 3\) points.
Grading Question 1c 2)
The grading is roughly based on the similar criteria as in previous conceptual questions. These criteria are:
Clarity (20% or 0.3 point): The arguments are clear.
Justifability (30% or 0.45 point): The arguments are based on the analytical results in Question 1c 1).
Validity (50% or 0.75 point): The arguments are self-consistent, reasonable, and not against common marketing insights or understanding of consumers.
Note that the ad discussed here should be the one that is thought to be more effective in Question 2b. The grading is based on the quality of argumentaion, Either conclusion of Yes or No is fine.
Question 2d (3 points)
This question is a conceptual question. There are two important things to look at: first, your answer must point out the cost for the company, as well as the trade-off in the assignment of units; second, you must make use of the assumption that the treatment effects are know. For example, you can use a power calculator and use the estimated treatment effects as the input of the true effects to calculate sample sizes and splits.
Grading Question 2d
The grading is based on the following aspects:
- You must point out the cots of the company (1 point). The idea is the company incurred a loss (i.e., sunk costs) (0.5 point) for the control group as the control group are not served with any ads. In addition, the company could face an opportunistic costs (0.5 point) as one of the treatment ads is better than the other. The treatment group with the less effective ad is underperforming.
- You must point out the trade-off of the company in the assignment of groups (0.5 point). Smaller group sizes (of treatment groups) reduce the loss. Yet, smaller group sizes lead to lower power in detecting the treatment effects.
- You must make use of the assumption that the treatment effects are known (1.5 point). With the known true ATEs (the estimated coefficients and standard errors in Question 2a), we may use tools such as post-hoc power analysis to support the decision of the group assignment. You can use the ATEs in different ways, but this information must be used to your advantage.