This assignment takes 25% of the overall grade. The total grade points for this assignment is thus 25 points, with Question 1 - 10 points and Question 2 - 15 points. Please download and load data Assignment_1.Rdata from Canvas.

load("Assignment_1.Rdata")
ls()
## [1] "CLV_data"      "Conjoint_data" "lik_bgnbd"     "pred_Y"

Question 1

This question uses “Conjoint_data”:

head(Conjoint_data)
##   respondent_id profile        form   noapply disinfect bio    price choice
## 1             1       1 Concentrate 200 times       Yes  No 35 cents      1
## 2             1       2      Powder 200 times       Yes  No 35 cents      0
## 3             1       3      Premix 100 times       Yes Yes 49 cents      1
## 4             1       4      Powder 200 times       Yes Yes 49 cents      0
## 5             1       5      Powder  50 times       Yes  No 79 cents      0
## 6             1       6 Concentrate 200 times        No Yes 79 cents      0

Question 1a (4 points)

We first have to set the baselines for different attributes, using relevel(). You may also use “mapply” or other packages for this purpose.

# to set baselines for all attributes
Conjoint_data$form <- relevel(Conjoint_data$form,ref = "Powder")
Conjoint_data$noapply <- relevel(Conjoint_data$noapply,ref = "200 times")
Conjoint_data$disinfect <- relevel(Conjoint_data$disinfect,ref = "No")
Conjoint_data$bio <- relevel(Conjoint_data$bio,ref = "No")
Conjoint_data$price <- relevel(Conjoint_data$price,ref = "35 cents")

# double check if the baselines are set as wanted
lapply(Conjoint_data[,3:7],contrasts)
## $form
##             Concentrate Premix
## Powder                0      0
## Concentrate           1      0
## Premix                0      1
## 
## $noapply
##           100 times 50 times
## 200 times         0        0
## 100 times         1        0
## 50 times          0        1
## 
## $disinfect
##     Yes
## No    0
## Yes   1
## 
## $bio
##     Yes
## No    0
## Yes   1
## 
## $price
##          49 cents 79 cents
## 35 cents        0        0
## 49 cents        1        0
## 79 cents        0        1

We then run a logistic regression with “Conjoint_data” using “choice” as the binary outcome.

mdl <- glm(choice ~ . - respondent_id - profile, 
           family = "binomial", 
           data = Conjoint_data)
summary(mdl)
## 
## Call:
## glm(formula = choice ~ . - respondent_id - profile, family = "binomial", 
##     data = Conjoint_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1680  -0.7913   0.3276   0.8173   2.1374  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.1357     0.4662   2.436 0.014859 *  
## formConcentrate    0.6482     0.3293   1.968 0.049035 *  
## formPremix        -0.2519     0.3347  -0.753 0.451638    
## noapply100 times  -0.2335     0.3457  -0.675 0.499416    
## noapply50 times   -0.5904     0.3601  -1.639 0.101111    
## disinfectYes       1.1143     0.2998   3.717 0.000202 ***
## bioYes             0.2762     0.2862   0.965 0.334477    
## price49 cents     -1.3828     0.3609  -3.831 0.000128 ***
## price79 cents     -3.0606     0.3810  -8.034 9.46e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 449.25  on 329  degrees of freedom
## Residual deviance: 348.95  on 321  degrees of freedom
## AIC: 366.95
## 
## Number of Fisher Scoring iterations: 4

Grading for Question 1a

For this question, the coefficients and stanard errors should be consistent with the suggested solutions. From the results, we have 8 coefficients with {“formConcentrate”, “formPremix”, “noapply100 times”, “noapply50 times”, “disinfectYes”, bioYes”, “price49 cents”, “price79 cents”}, with each coefficient \(0.25\) point and each standard error \(0.25\) point, so the total is \(8\times0.25+8\times0.25=4\).

No requirements for the no. digits.

Question 1b (4 points)

To calculate the partworths, we must apply the 3 rules:

  1. Partworths of baselines are zeros;
  2. Partworths of significant levels are the original coefficients;
  3. Partworths of the insignificant levels are zeros.
# create a list of partworths with each attribute as an element
# initialize all values to zeros
partworths <- lapply(mdl$xlevels,function(x){x <- rep(0,length(x))})
change_names <- function (x,y) {
  names(x) <- y
  return(x)
}
partworths <- mapply(change_names,partworths,mdl$xlevels)

# set partworths of significant levels to original coefficients
partworths$form["Concentrate"] <- mdl$coefficients["formConcentrate"]
partworths$disinfect["Yes"] <- mdl$coefficients["disinfectYes"]
partworths$price[c("49 cents","79 cents")] <- mdl$coefficients[c("price49 cents","price79 cents")]
partworths
## $form
##      Powder Concentrate      Premix 
##   0.0000000   0.6481925   0.0000000 
## 
## $noapply
## 200 times 100 times  50 times 
##         0         0         0 
## 
## $disinfect
##       No      Yes 
## 0.000000 1.114257 
## 
## $bio
##  No Yes 
##   0   0 
## 
## $price
##  35 cents  49 cents  79 cents 
##  0.000000 -1.382753 -3.060619

Grading Question 1b

For this question, all partworths should be same as the suggested solutions. We have in total 13 partworths, with each 0.31 point.

Question 1c (2 points)

The most preferred product should give consumers the highest utility. We focus on levels that have the maximum partworth for different attributes.

Attributes Levels
Form Concentrate
No of Apps Any level
Disinfect Yes
Biodegradable Any level
Price 35 cents


Grading for Question 1c

For this question, there are 5 attributes, the point of each attribute is 0.4, so in total \(5\times0.4=2\). Note for no. of apps and biodegradable, the answer should be “any level is fine” or “consumers do not care about this attribute.”

Question 2

This question uses “CLV_data”, as well as two functions lik_bgnbd and pred_Y from Case II. You also need to use the library “hypergeo” for prediction of logins.

Question 2a (6 points)

# to specify the constraint matrix
ui <- diag(4)
ci <- rep(0,4)

# para: the starting values; a vector with 4 elements corresponding to (r,alpha,a,b). 
# you can change this to other values or test different values.
para <- rep(.01,4)

# to run the constrained optimization 
results <- constrOptim(para, lik_bgnbd, NULL, ui, ci)

# to get the estimated parameters in the order of (r,alpha,a,b)
para <- results$par
names(para) <- c("r","alpha","a","b")
para
##         r     alpha         a         b 
## 0.2324974 8.9573195 1.1935137 3.1910102

Grading Question 2a

For this question, each parameter is \(1.5\) points, so for \({r,\alpha,a,b}\), in total \(1.5\times4=6\) points.

Note: depending on the inital values of para, the estimated values could be somewhat different. As long as the parameters are close to the suggested soultions, should be fine.

Question 2b (5 points)

# To predict the future values of Gamer 1-5 in 30 days
Yhat <- rep(0,5)
names(Yhat) <- paste("Gamer",as.character(1:5),sep = "_")
for (i in 1:5) {
  Yhat[i] <- pred_Y(para,CLV_data[i,],30)
}
Yhat
##    Gamer_1    Gamer_2    Gamer_3    Gamer_4    Gamer_5 
## 0.17780954 0.24341699 0.08669631 0.75265953 0.08925523

Grading Question 2b

For this question, each parameter is \(1.0\) point, so for \(5\) gamers, in total \(1.0\times5=5.0\) points.

Note: depending on the estimted values of paras, the predicted consumer values could be somewhat different. If the values are indeed different, check the R codes and see if they are consistent with those in the suggested solutions. If yes, should be fine. Note: here I use a loop for the 5 gamers. Other methods such as lapply or those methods in tidyverse can be used (e.g., mapping or do or modify_group).

Question 2c (4 points)

This question is an open question. There could be many different answers. Here, I show an exemplary answer:

Another source of consumer values is social influence, as gamers’ activities are influenced by other gamers. This is because most online games have some elements which incentivize or exploit social influence between gamers. To measure social influence, we can rely on the social network between gamers and apply social network analysis to identify influencers. For example, we may focus on the degree centrality of gamers (i.e., no. of friends), as previous research finds that degree centrality is a good measure of people’s influence on social networks.

Grading Question 2c

By the grading rules, for conceptual questions, we will follow these aspects.

1. Clarity of the proposed source of consumer value (0.5 point): the proposed consumer value should be clearly explained.
2. Validity of the proposed source of consumer value (1.0 point): the proposed consumer value should indirectly influence the active levels of gamers.
3. Clarity of the proposed method (0.5 point): the proposed method should be clearly described and explained.
4. Validity of the proposed method (1.5 points): the proposed method can measure the proposed source of consumer value, if applied empirically.
5. Feasbility of the proposed method (0.5 point): the proposed method and the data needed for the method are feasible in practices.