This assignment takes 25% of the overall grade. The total grade
points for this assignment is thus 25 points, with Question 1 - 10
points and Question 2 - 15 points. Please download and load data
Assignment_1.Rdata from Canvas.
## [1] "CLV_data" "Conjoint_data" "lik_bgnbd" "pred_Y"
Question 1
This question uses “Conjoint_data”:
## respondent_id profile form noapply disinfect bio price choice
## 1 1 1 Concentrate 200 times Yes No 35 cents 1
## 2 1 2 Powder 200 times Yes No 35 cents 0
## 3 1 3 Premix 100 times Yes Yes 49 cents 1
## 4 1 4 Powder 200 times Yes Yes 49 cents 0
## 5 1 5 Powder 50 times Yes No 79 cents 0
## 6 1 6 Concentrate 200 times No Yes 79 cents 0
Question 1a (4 points)
We first have to set the baselines for different attributes, using
relevel(). You may also use “mapply” or other packages for
this purpose.
# to set baselines for all attributes
Conjoint_data$form <- relevel(Conjoint_data$form,ref = "Powder")
Conjoint_data$noapply <- relevel(Conjoint_data$noapply,ref = "200 times")
Conjoint_data$disinfect <- relevel(Conjoint_data$disinfect,ref = "No")
Conjoint_data$bio <- relevel(Conjoint_data$bio,ref = "No")
Conjoint_data$price <- relevel(Conjoint_data$price,ref = "35 cents")
# double check if the baselines are set as wanted
lapply(Conjoint_data[,3:7],contrasts)## $form
## Concentrate Premix
## Powder 0 0
## Concentrate 1 0
## Premix 0 1
##
## $noapply
## 100 times 50 times
## 200 times 0 0
## 100 times 1 0
## 50 times 0 1
##
## $disinfect
## Yes
## No 0
## Yes 1
##
## $bio
## Yes
## No 0
## Yes 1
##
## $price
## 49 cents 79 cents
## 35 cents 0 0
## 49 cents 1 0
## 79 cents 0 1
We then run a logistic regression with “Conjoint_data” using “choice” as the binary outcome.
mdl <- glm(choice ~ . - respondent_id - profile,
family = "binomial",
data = Conjoint_data)
summary(mdl)##
## Call:
## glm(formula = choice ~ . - respondent_id - profile, family = "binomial",
## data = Conjoint_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1680 -0.7913 0.3276 0.8173 2.1374
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1357 0.4662 2.436 0.014859 *
## formConcentrate 0.6482 0.3293 1.968 0.049035 *
## formPremix -0.2519 0.3347 -0.753 0.451638
## noapply100 times -0.2335 0.3457 -0.675 0.499416
## noapply50 times -0.5904 0.3601 -1.639 0.101111
## disinfectYes 1.1143 0.2998 3.717 0.000202 ***
## bioYes 0.2762 0.2862 0.965 0.334477
## price49 cents -1.3828 0.3609 -3.831 0.000128 ***
## price79 cents -3.0606 0.3810 -8.034 9.46e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 449.25 on 329 degrees of freedom
## Residual deviance: 348.95 on 321 degrees of freedom
## AIC: 366.95
##
## Number of Fisher Scoring iterations: 4
Grading for Question 1a
For this question, the coefficients and stanard errors should be consistent with the suggested solutions. From the results, we have 8 coefficients with {“formConcentrate”, “formPremix”, “noapply100 times”, “noapply50 times”, “disinfectYes”, bioYes”, “price49 cents”, “price79 cents”}, with each coefficient \(0.25\) point and each standard error \(0.25\) point, so the total is \(8\times0.25+8\times0.25=4\).
No requirements for the no. digits.
Question 1b (4 points)
To calculate the partworths, we must apply the 3 rules:
- Partworths of baselines are zeros;
- Partworths of significant levels are the original coefficients;
- Partworths of the insignificant levels are zeros.
# create a list of partworths with each attribute as an element
# initialize all values to zeros
partworths <- lapply(mdl$xlevels,function(x){x <- rep(0,length(x))})
change_names <- function (x,y) {
names(x) <- y
return(x)
}
partworths <- mapply(change_names,partworths,mdl$xlevels)
# set partworths of significant levels to original coefficients
partworths$form["Concentrate"] <- mdl$coefficients["formConcentrate"]
partworths$disinfect["Yes"] <- mdl$coefficients["disinfectYes"]
partworths$price[c("49 cents","79 cents")] <- mdl$coefficients[c("price49 cents","price79 cents")]
partworths## $form
## Powder Concentrate Premix
## 0.0000000 0.6481925 0.0000000
##
## $noapply
## 200 times 100 times 50 times
## 0 0 0
##
## $disinfect
## No Yes
## 0.000000 1.114257
##
## $bio
## No Yes
## 0 0
##
## $price
## 35 cents 49 cents 79 cents
## 0.000000 -1.382753 -3.060619
Grading Question 1b
For this question, all partworths should be same as the suggested solutions. We have in total 13 partworths, with each 0.31 point.
Question 1c (2 points)
The most preferred product should give consumers the highest utility.
We focus on levels that have the maximum partworth for different
attributes.
| Attributes | Levels |
|---|---|
| Form | Concentrate |
| No of Apps | Any level |
| Disinfect | Yes |
| Biodegradable | Any level |
| Price | 35 cents |
Grading for Question 1c
For this question, there are 5 attributes, the point of each attribute is 0.4, so in total \(5\times0.4=2\). Note for no. of apps and biodegradable, the answer should be “any level is fine” or “consumers do not care about this attribute.”
Question 2
This question uses “CLV_data”, as well as two functions
lik_bgnbd and pred_Y from Case II. You also
need to use the library “hypergeo” for prediction of logins.
Question 2a (6 points)
# to specify the constraint matrix
ui <- diag(4)
ci <- rep(0,4)
# para: the starting values; a vector with 4 elements corresponding to (r,alpha,a,b).
# you can change this to other values or test different values.
para <- rep(.01,4)
# to run the constrained optimization
results <- constrOptim(para, lik_bgnbd, NULL, ui, ci)
# to get the estimated parameters in the order of (r,alpha,a,b)
para <- results$par
names(para) <- c("r","alpha","a","b")
para## r alpha a b
## 0.2324974 8.9573195 1.1935137 3.1910102
Grading Question 2a
For this question, each parameter is \(1.5\) points, so for \({r,\alpha,a,b}\), in total \(1.5\times4=6\) points.
Note: depending on the inital values of para, the estimated values could be somewhat different. As long as the parameters are close to the suggested soultions, should be fine.
Question 2b (5 points)
# To predict the future values of Gamer 1-5 in 30 days
Yhat <- rep(0,5)
names(Yhat) <- paste("Gamer",as.character(1:5),sep = "_")
for (i in 1:5) {
Yhat[i] <- pred_Y(para,CLV_data[i,],30)
}
Yhat## Gamer_1 Gamer_2 Gamer_3 Gamer_4 Gamer_5
## 0.17780954 0.24341699 0.08669631 0.75265953 0.08925523
Grading Question 2b
For this question, each parameter is \(1.0\) point, so for \(5\) gamers, in total \(1.0\times5=5.0\) points.
Note: depending on the estimted values of paras, the predicted consumer values could be somewhat different. If the values are indeed different, check the R codes and see if they are consistent with those in the suggested solutions. If yes, should be fine. Note: here I use a loop for the 5 gamers. Other methods such as
lapplyor those methods intidyversecan be used (e.g.,mappingordoormodify_group).
Question 2c (4 points)
This question is an open question. There could be many different answers. Here, I show an exemplary answer:
Another source of consumer values is social influence, as gamers’ activities are influenced by other gamers. This is because most online games have some elements which incentivize or exploit social influence between gamers. To measure social influence, we can rely on the social network between gamers and apply social network analysis to identify influencers. For example, we may focus on the degree centrality of gamers (i.e., no. of friends), as previous research finds that degree centrality is a good measure of people’s influence on social networks.
Grading Question 2c
By the grading rules, for conceptual questions, we will follow these aspects.
1. Clarity of the proposed source of consumer value (0.5 point): the proposed consumer value should be clearly explained.
2. Validity of the proposed source of consumer value (1.0 point): the proposed consumer value should indirectly influence the active levels of gamers.
3. Clarity of the proposed method (0.5 point): the proposed method should be clearly described and explained.
4. Validity of the proposed method (1.5 points): the proposed method can measure the proposed source of consumer value, if applied empirically.
5. Feasbility of the proposed method (0.5 point): the proposed method and the data needed for the method are feasible in practices.