Data Description
Please download the data from Canvas and put it in the same folder as
the .rmd file.
## respondent_id profile_id price brand capacity shape choice
## 1 1 1 70ct Sagres 200ml Spanish 0
## 2 1 2 30ct Cristal 200ml Spanish 0
## 3 1 3 60ct Super Bock 250ml Spanish 0
## 4 1 4 70ct Super Bock 250ml Spanish 0
## 5 1 5 30ct Sagres 330ml Spanish 0
## 6 1 6 50ct Cristal 330ml Spanish 0
The data has 4 product attributes for beers and 3 or 4 levels for each attribute. The levels of these attributes are as below:
## $price
## [1] "30ct" "50ct" "60ct" "70ct"
##
## $brand
## [1] "Cristal" "Sagres" "Super Bock"
##
## $capacity
## [1] "200ml" "250ml" "330ml"
##
## $shape
## [1] "Long Neck" "Rocket" "Spanish"
Estimating the Binary Choice Model
We first estimate a binary choice model (logistic) to obtain the
estimation. The baseline levels are already set for you. In your own
practices, you may use relevel() function to choose your
own baselines. In theory, you can use any level as the baseline for an
attribute.
# running the logistic regression
mdl <- glm(choice ~ price + brand + capacity + shape,
family = "binomial", data)
# get a summary of the model results
results <- summary(mdl)
results##
## Call:
## glm(formula = choice ~ price + brand + capacity + shape, family = "binomial",
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.7294 0.4361 3.965 7.33e-05 ***
## price50ct -2.0558 0.4746 -4.332 1.48e-05 ***
## price60ct -0.5594 0.3778 -1.481 0.139
## price70ct -1.9348 0.3312 -5.842 5.16e-09 ***
## brandSagres 2.4692 0.3181 7.761 8.40e-15 ***
## brandSuper Bock 2.7227 0.4127 6.597 4.19e-11 ***
## capacity250ml -2.5363 0.3263 -7.773 7.69e-15 ***
## capacity330ml -4.1839 0.3899 -10.730 < 2e-16 ***
## shapeRocket -2.0051 0.3050 -6.575 4.87e-11 ***
## shapeSpanish -4.1298 0.3888 -10.621 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1057.06 on 935 degrees of freedom
## Residual deviance: 464.55 on 926 degrees of freedom
## AIC: 484.55
##
## Number of Fisher Scoring iterations: 6
Obtaining the Partworths
Three rules to transform the coefficients to partworths:
- Baseline levels partworths = 0
- Insignificant levels partworths = 0
- Significant levels partworths = coefficients
A list called “partworth” is pre-created and shared with you for the collection of partworths. As different attributes may have different no. of levels, a list is convenient to save the partworths. Of course, you can also create a table elsewhere (like in Excel or Word) and put the partworths in the table for reporting.
You can see that the baselines in the partworths list are already set to zeros (Rule 1).
## $brand
## Cristal Sagres Super Bock
## 0 0 0
##
## $capacity
## 200ml 250ml 330ml
## 0 0 0
##
## $price
## 30ct 50ct 60ct 70ct
## 0 0 0 0
##
## $shape
## Long Neck Rocket Spanish
## 0 0 0
Following the rules, We need to first set insignificant coefficients to zeros (Rule 2 & 3), with the cutoff of p-values as 0.05.
## (Intercept) price50ct price60ct price70ct brandSagres
## 1.729359 -2.055824 0.000000 -1.934842 2.469227
## brandSuper Bock capacity250ml capacity330ml shapeRocket shapeSpanish
## 2.722735 -2.536253 -4.183921 -2.005092 -4.129809
Next, we set the partworths based on the transformed coefficients - “coeffs”.
partworth$brand[2:3] <- coeffs[5:6]
partworth$capacity[2:3] <- coeffs[7:8]
partworth$price[2:4] <- coeffs[2:4]
partworth$shape[2:3] <- coeffs[9:10]
partworth## $brand
## Cristal Sagres Super Bock
## 0.000000 2.469227 2.722735
##
## $capacity
## 200ml 250ml 330ml
## 0.000000 -2.536253 -4.183921
##
## $price
## 30ct 50ct 60ct 70ct
## 0.000000 -2.055824 0.000000 -1.934842
##
## $shape
## Long Neck Rocket Spanish
## 0.000000 -2.005092 -4.129809
Evaluating the 4 Strategies
Given the partworths, we can now evaluate and compare the 4 strategies. The 4 strategies are:
- To attack with Cristal long-neck shape 200ml priced at 50 cents
- To attack with Super Bock rocket shape 330ml priced at 60 cents
- To launch a Super Bock long-neck shape 200ml priced at 50 cents
- To re-launch a Super Bock long-neck shape 250ml priced at 50 cents
Per logistic regression, the binary choice probability is in the logit form: \[Prob(X)=\dfrac{exp(X)}{1+exp(X)}=\dfrac{1}{1+exp(-X)}.\]
To make the calculation easier, we first write a function for to
calculate the choice probabilities. This function takes a \(4\)-element index vector (an integer
vector) as input and outputs the choice probabilities. Each element of
the index vector indicates which levels of an attribute. The elements of
the vector are in the same order as the partworth list. For
example, a vector of c(1,1,2,2) indexes:
- The 1st level of brand
- The 1st level of capacity
- The 2nd level of price
- The 2nd level of shape
f <- function(ind) {
# do not forget to add the intercept or coeffs[1]
x <- coeffs[1] +
# partworths of 4 attributes in the order of
# brand --> capacity --> price --> shape
partworth$brand[ind[1]] +
partworth$capacity[ind[2]] +
partworth$price[ind[3]] +
partworth$shape[ind[4]]
#outputs the choice probabilities
return(1/(1+exp(-x)))
}Using the function, we can calculate the choice probabilities of the
4 strategies. Please check the levels for different attributes for a
strategy. For example, for the 1st strategy
To attack with Cristal long-neck shape 200ml priced at 50 cents.
We have:
- brand - Cristal - 1st level
- capacity - 200ml - 1st level
- price - 50 cents - 2nd level
- shape - long-neck - 1st level
So, we create an index vector c(1,1,2,1) and use it as
input. For other strategies, the procedure is the same.
decision.prob <- c(f(c(1,1,2,1)),
f(c(3,3,3,2)),
f(c(3,1,2,1)),
f(c(3,2,2,1))
)
100*as.numeric(decision.prob)## [1] 41.91009 14.97047 91.65424 46.50612
From the results, the third strategy has the highest choice probabilities \(91.65\%\). So, from the data analytics alone, the launch of Superbock mini seems to be the best option.