Data Description

Please download the data from Canvas and put it in the same folder as the .rmd file.

# get a sneak peak of the data 
load("super_bock_mini.RData")
head(data)
##   respondent_id profile_id price      brand capacity   shape choice
## 1             1          1  70ct     Sagres    200ml Spanish      0
## 2             1          2  30ct    Cristal    200ml Spanish      0
## 3             1          3  60ct Super Bock    250ml Spanish      0
## 4             1          4  70ct Super Bock    250ml Spanish      0
## 5             1          5  30ct     Sagres    330ml Spanish      0
## 6             1          6  50ct    Cristal    330ml Spanish      0

The data has 4 product attributes for beers and 3 or 4 levels for each attribute. The levels of these attributes are as below:

# check the levels of all attributes
lapply(data[,3:6],levels)
## $price
## [1] "30ct" "50ct" "60ct" "70ct"
## 
## $brand
## [1] "Cristal"    "Sagres"     "Super Bock"
## 
## $capacity
## [1] "200ml" "250ml" "330ml"
## 
## $shape
## [1] "Long Neck" "Rocket"    "Spanish"

Estimating the Binary Choice Model

We first estimate a binary choice model (logistic) to obtain the estimation. The baseline levels are already set for you. In your own practices, you may use relevel() function to choose your own baselines. In theory, you can use any level as the baseline for an attribute.

# running the logistic regression 
mdl <- glm(choice ~ price + brand + capacity + shape, 
           family = "binomial", data)

# get a summary of the model results
results <- summary(mdl)
results
## 
## Call:
## glm(formula = choice ~ price + brand + capacity + shape, family = "binomial", 
##     data = data)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.7294     0.4361   3.965 7.33e-05 ***
## price50ct        -2.0558     0.4746  -4.332 1.48e-05 ***
## price60ct        -0.5594     0.3778  -1.481    0.139    
## price70ct        -1.9348     0.3312  -5.842 5.16e-09 ***
## brandSagres       2.4692     0.3181   7.761 8.40e-15 ***
## brandSuper Bock   2.7227     0.4127   6.597 4.19e-11 ***
## capacity250ml    -2.5363     0.3263  -7.773 7.69e-15 ***
## capacity330ml    -4.1839     0.3899 -10.730  < 2e-16 ***
## shapeRocket      -2.0051     0.3050  -6.575 4.87e-11 ***
## shapeSpanish     -4.1298     0.3888 -10.621  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1057.06  on 935  degrees of freedom
## Residual deviance:  464.55  on 926  degrees of freedom
## AIC: 484.55
## 
## Number of Fisher Scoring iterations: 6

Obtaining the Partworths

Three rules to transform the coefficients to partworths:

  1. Baseline levels partworths = 0
  2. Insignificant levels partworths = 0
  3. Significant levels partworths = coefficients

A list called “partworth” is pre-created and shared with you for the collection of partworths. As different attributes may have different no. of levels, a list is convenient to save the partworths. Of course, you can also create a table elsewhere (like in Excel or Word) and put the partworths in the table for reporting.

You can see that the baselines in the partworths list are already set to zeros (Rule 1).

partworth
## $brand
##    Cristal     Sagres Super Bock 
##          0          0          0 
## 
## $capacity
## 200ml 250ml 330ml 
##     0     0     0 
## 
## $price
## 30ct 50ct 60ct 70ct 
##    0    0    0    0 
## 
## $shape
## Long Neck    Rocket   Spanish 
##         0         0         0

Following the rules, We need to first set insignificant coefficients to zeros (Rule 2 & 3), with the cutoff of p-values as 0.05.

coeffs <- results$coefficients[,1]*(results$coefficients[,4]<.05)
coeffs
##     (Intercept)       price50ct       price60ct       price70ct     brandSagres 
##        1.729359       -2.055824        0.000000       -1.934842        2.469227 
## brandSuper Bock   capacity250ml   capacity330ml     shapeRocket    shapeSpanish 
##        2.722735       -2.536253       -4.183921       -2.005092       -4.129809

Next, we set the partworths based on the transformed coefficients - “coeffs”.

partworth$brand[2:3] <- coeffs[5:6]
partworth$capacity[2:3] <- coeffs[7:8]
partworth$price[2:4] <- coeffs[2:4]
partworth$shape[2:3] <- coeffs[9:10]
partworth
## $brand
##    Cristal     Sagres Super Bock 
##   0.000000   2.469227   2.722735 
## 
## $capacity
##     200ml     250ml     330ml 
##  0.000000 -2.536253 -4.183921 
## 
## $price
##      30ct      50ct      60ct      70ct 
##  0.000000 -2.055824  0.000000 -1.934842 
## 
## $shape
## Long Neck    Rocket   Spanish 
##  0.000000 -2.005092 -4.129809

Evaluating the 4 Strategies

Given the partworths, we can now evaluate and compare the 4 strategies. The 4 strategies are:

Per logistic regression, the binary choice probability is in the logit form: \[Prob(X)=\dfrac{exp(X)}{1+exp(X)}=\dfrac{1}{1+exp(-X)}.\]

To make the calculation easier, we first write a function for to calculate the choice probabilities. This function takes a \(4\)-element index vector (an integer vector) as input and outputs the choice probabilities. Each element of the index vector indicates which levels of an attribute. The elements of the vector are in the same order as the partworth list. For example, a vector of c(1,1,2,2) indexes:

f <- function(ind) {
  
  # do not forget to add the intercept or coeffs[1]
  x <- coeffs[1] +  
    # partworths of 4 attributes in the order of 
    # brand --> capacity --> price --> shape
    partworth$brand[ind[1]] + 
    partworth$capacity[ind[2]] + 
    partworth$price[ind[3]] + 
    partworth$shape[ind[4]]
  #outputs the choice probabilities
  return(1/(1+exp(-x)))
}

Using the function, we can calculate the choice probabilities of the 4 strategies. Please check the levels for different attributes for a strategy. For example, for the 1st strategy To attack with Cristal long-neck shape 200ml priced at 50 cents. We have:

So, we create an index vector c(1,1,2,1) and use it as input. For other strategies, the procedure is the same.

decision.prob <- c(f(c(1,1,2,1)),
                   f(c(3,3,3,2)),
                   f(c(3,1,2,1)),
                   f(c(3,2,2,1))
                   )
100*as.numeric(decision.prob)
## [1] 41.91009 14.97047 91.65424 46.50612

From the results, the third strategy has the highest choice probabilities \(91.65\%\). So, from the data analytics alone, the launch of Superbock mini seems to be the best option.