Connect with me on Open Science Framework | Contact me via LinkedIn

Depending on your machine it might be necessary to use right-click -> open in new browser window.

R analysis script for validating the predictive validity of estimated part worth utilities from a conjoint analysis with the help of hold-out-tasks (HOTs). Estimation uses the simple example we discussed during class (Chapter 5, only 3 consumer with a HOT comprising of 3 alternatives). Make sure that the file ‘HOTs data lecture.xlsx’ is located in your RStudio project folder. It contains the predicted choice shares as well as the actual HOT choices for the 3 consumers.

1 Load packages that will be used throughout the analysis

Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.

In R most functionality is provided by additional packages.
Most of the packages are well documented, See: https://cran.r-project.org/

The code chunk below first evaluates if the package pacman is already installed to your machine. If yes, the corresponding package will be loaded. If not, R will install the package.

Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).

The second line then loads the package pacman via the library() function.

The third line uses the function p_load() from the pacman package to install (if necessary) and load all packages that we provide as arguments (e.g., dplyr).

if(!"pacman" %in% rownames(installed.packages())) install.packages("pacman")

library(pacman)

pacman::p_load(knitr, tidyverse, psych, irr, readxl, BayesFactor, haven, latex2exp)

In case you are not able to load a package, try to first install it via for example ‘install.packages(knitr)’ whereby foreign is the name of the package.

Here is the R session info which gives you information on my machine, all loaded packages and their version:

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] latex2exp_0.4.0        haven_2.3.1            BayesFactor_0.9.12-4.2
##  [4] Matrix_1.2-18          coda_0.19-3            readxl_1.3.1          
##  [7] irr_0.84.1             lpSolve_5.6.15         psych_2.0.7           
## [10] forcats_0.5.0          stringr_1.4.0          dplyr_1.0.0           
## [13] purrr_0.3.4            readr_1.3.1            tidyr_1.1.0           
## [16] tibble_3.0.3           ggplot2_3.3.2          tidyverse_1.3.0       
## [19] knitr_1.29             pacman_0.5.1          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5         mvtnorm_1.1-1      lubridate_1.7.9    lattice_0.20-41   
##  [5] gtools_3.8.2       assertthat_0.2.1   digest_0.6.25      R6_2.4.1          
##  [9] cellranger_1.1.0   MatrixModels_0.4-1 backports_1.1.8    reprex_0.3.0      
## [13] evaluate_0.14      httr_1.4.2         pillar_1.4.6       rlang_0.4.7       
## [17] rstudioapi_0.11    blob_1.2.1         rmarkdown_2.3      munsell_0.5.0     
## [21] broom_0.7.0        compiler_4.0.2     modelr_0.1.8       xfun_0.16         
## [25] pkgconfig_2.0.3    mnormt_2.0.1       tmvnsim_1.0-2      htmltools_0.5.0   
## [29] tidyselect_1.1.0   fansi_0.4.1        crayon_1.3.4       dbplyr_1.4.4      
## [33] withr_2.2.0        grid_4.0.2         nlme_3.1-148       jsonlite_1.7.0    
## [37] gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0          magrittr_1.5      
## [41] scales_1.1.1       pbapply_1.4-3      cli_2.0.2          stringi_1.4.6     
## [45] fs_1.4.2           xml2_1.3.2         ellipsis_0.3.1     generics_0.0.2    
## [49] vctrs_0.3.2        tools_4.0.2        glue_1.4.1         hms_0.5.3         
## [53] parallel_4.0.2     yaml_2.2.1         colorspace_1.4-1   rvest_0.3.6

2 Import the *.xlsx file from Excel

Make sure that the file ‘HOTs data lecture.xlsx’ (see E-Learning platform) is located at your RStudio project’s directory.

Save the file into a new dataframe called d

R is a functional language, meaning we could define a function that takes some arguments as input and delivers a result as an output.

We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name (i.e., read_excel() below).

In the code chunk below, we hand-in the file name as well as the argument sheet = 1 to the read_excel() function.
In this way we tell the function that it should import that data of the first spreadsheet in the file.

To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’ (on Windows OS).
If you encounter any problems with the read functions, please try to copy your *xlsx-file to the desktop and then use:

‘d<-as.data.frame(read_excel(file.choose(), sheet = 1))’.
This will open a dialogue that allows for a manual selection of the file.

d<-as.data.frame(read_excel("HOTs data lecture.xlsx", sheet = 1))

Now, the object d (a dataframe) became part of our working environment. We can access the dataframe.

For Example we can view the dataframe d via the function View(). We can also preview the first part of the data set by head()

The code below returns d.

View(d)

Brief Overview
ID	OptionA	OptionB	OptionC	Choice
1	98.20	0.00	1.80	1
2	26.89	0.00	73.11	1
3	0.00	11.92	88.08	2

We see that each row corresponds to one consumer. Furthermore, we see that the first column contains a unique respondent ID. The second, third, and fourth column contain the predicted choice shares drawing on our estimated Hierarchical Bayes part worth utilities. For example, based on the choice simulation, consumer 1 is predicted to opt for product option A with a probability of 98.2%. The final column presents the actual choice of the consumer. As already discussed during class, the predictive validity is poor, because only in 1 out of 3 cases the conjoint analysis utilities predict the actual consumer choice (consumer #1). Eventually, the conjoint model predicts not better than what we would have obtained through guessing.

3 Data transformations

Before, we continue to calculate the metrics discussed during class, we have to underwent some data transformations. Do not worry, at the end of the unit, I will show you how to automate this boring stuff for the future.

First, we want to establish a new variable that indicates, which Holdout product option is predicted to be chosen.

The code chunk below illustrates how to do this.

Explanation:

We establish a new variable called ‘Predicted_Choice’ in d and initialize its values with 0 for each case. (This increases the number of columns in d to 6.)

Then, we apply two for() loops as well as an if() condition, which might seem somewhat complicated. However, I will try to explain each step in plain language.

‘for(i in 1:nrow(d))’ instructs R to apply a certain command for each i in the interval [1|number of lines in d]. The subsequent command is then applied for each line i in the dataframe d, whereby each line corresponds to a consumer participating in the conjoint study.

Within the outer for() loop we find another one. This time, we tell R that we want to apply a certain command for each k within the interval [1|number of columns in d minus 3]. In fact ‘ncol(d)-3’ gives the number of product options included in the HOT (3 in our tiny example). That is, the command within the inner for() loop is applied for values of k ranging from 1 to 3.

Within the inner for() loop, we finally use an if() condition. In R, an if() condition has the general form if(test_expression==TRUE) then execute a command x. Lets look at the tested expression more closely. We evaluate whether ‘d[i, (k+1)]==max(d[i, 2:(ncol(d)-2)])’ returns ‘TRUE’ or ‘FALSE’.

The first part ‘d[i, (k+1)]’: Keep in mind, that i starts at a value of 1 as also k does. Thus, in the first instance, we return the element d[1,2], which is the value 98.20, the predicted choice share for OptionA at consumer #1.

Now, we proceed with the second part ‘max(d[i, 2:(ncol(d)-2)])’: Here, we use the max() function to return the maximum of a range of values. We consider the values ‘d[i, 2:(ncol(d)-2)]’, that is, we are still evaluating the first consumer (i=1). However, we apply max() to all columns of d that range from column 2 (OptionA) to column 4 (OptionC). Thus, the maximum across these columns for consumer #1 is 98.20.

In this first step of the inner loop, ‘d[i, (k+1)]==max(d[i, 2:(ncol(d)-2)])’ therefore returns ‘TRUE’, the predicted choice share of OptionA for consumer #1 is the highest among all alternatives in the HOT. Consequently, the command within the if() condition is executed. The command simply sets the value for the variable ‘Predicted_Choice’ for consumer #1 to k (which in the first step has the value 1).

Now, imagine how the looping moves on. The inner loop is repeated, while each time k is increased by 1, as long as k reaches the number of alternatives in the HOT (3). Then the outer loop is repeated again, while i is increased by 1 as long as i reaches the number of consumers (3). In each new round of the outer loop, the inner loop again starts with a k at value of 1.

d$Predicted_Choice <- 0


for (i in 1:nrow(d)) {
  for (k in 1:(ncol(d) - 3)) {
    if (d[i, (k + 1)] == max(d[i, 2:(ncol(d) - 2)])) {
      d$Predicted_Choice[i] <- k
    }
  }
}

Lets inspect the resulting dataframe:

View(d)

Brief Overview (this time including predicted option)
ID	OptionA	OptionB	OptionC	Choice	Predicted_Choice
1	98.20	0.00	1.80	1	1
2	26.89	0.00	73.11	1	3
3	0.00	11.92	88.08	2	3

Second we want to stack our dataframe from the wide format into a long format. That means that we want to achieve a dataframe where every respondents spreads over multiple lines and we only have 4 variables, the case ID’s, a variable indicating the option, the predicted choice shares, and the actual choice. A procedure that often is called melting or gathering a data structure.

The code chunk below does the following. We establish a new object called ‘d_stacked’. Next, we use the %>% operator (click here for further explanation) to feed-in our dataframe d as an argument into a function. The function in this case is pivot_longer() originating from the tidyr package. We hand-in several arguments to the function:

‘names_to = “Option”’ tells the function to create a new variable in the stacked dataframe that will contain the initial columns OptionA to OptionC.
‘values_to = “Predicted_Share”’ tell the function that we need a second variable that will contain the predicted choice shares corresponding to each option.
‘cols = names(d)[2]:names(d)[(ncol(d)-1)]’ returns a range of variables whose values will be stored in ‘Predicted_Share’. We apply the from x to y notation x:y to address a range of variable names of d. Specifically, names(d)[2] returns the second variable name (OptionA), while names(d)[(ncol(d)-2)] returns the variable name Option C. The call names(d)[(ncol(d))] would, in contrast, have returned the name of the last column in d.

d_stacked <- d %>%
  pivot_longer(cols = names(d)[2]:names(d)[(ncol(d) - 2)], names_to = "Option", values_to = "Predicted_Share")

Lets inspect the resulting dataframe:

View(d_stacked)

Brief Overview over stecked dataset
ID	Choice	Predicted_Choice	Option	Predicted_Share
1	1	1	OptionA	98.20
1	1	1	OptionB	0.00
1	1	1	OptionC	1.80
2	1	3	OptionA	26.89
2	1	3	OptionB	0.00
2	1	3	OptionC	73.11
3	2	3	OptionA	0.00
3	2	3	OptionB	11.92
3	2	3	OptionC	88.08

We see that we have successfully rearranged d into a long format d_stacked.

The next thing we would like to change is the coding of the variables ‘Choice’ and ‘Predicted_share’, which contain the actual consumer holdout choice and the predicted one. At the moment, all we see is that consumers #1 and #2 are truly opting for the first product option (OptionA), whereas consumer #3 opts for the second option (OptionB). For our subsequent analysis, however, it will be more convenient that the variables ‘Choice’ and ‘Predicted_share’ have the value of 1 only if the corresponding row value of variable ‘Option’ indicates the actually chosen/ the predicted alternative and the value of 0 otherwise.

The code chunk below illustrates one of the possible solutions for this problem.

Explanation: The first line of code adds a new temporary variable ‘temp’ to d_stacked. We use this variable just to number the available product options in the HOT serially (instead of only seeing OptionA, OptionB …). For this purpose we use the function rep(), which is in R a generic function for replicating values. Essentially, rep() expects 2 arguments in a call ‘rep.int(x, times)’:

‘x’, here is the value or the sequence to be repeated
‘times’ tells the function how often x should be repeated.

We specify x as a sequence ‘1:(ncol(d)-3)’, which starts at the value 1 and then proceeds with 2, 3, 4, … until it ends with ‘ncol(d)-3’. The last number is equivalent to the number of variables in d minus 3 (in our example = 3). So in our small-scale example the sequence x is (1, 2 , 3). We additionally specify ‘times’ as ‘nrow(d)’, which gives the number of consumers included in the study. To summarize, we repeat a sequence of the length equal to the number of choice alternatives in the HOT times the number of respondents.

d_stacked$temp<-rep(1:(ncol(d)-3), nrow(d))

Again, we can inspect the result:

View(d_stacked)

Brief Overview
ID	Choice	Predicted_Choice	Option	Predicted_Share	temp
1	1	1	OptionA	98.20	1
1	1	1	OptionB	0.00	2
1	1	1	OptionC	1.80	3
2	1	3	OptionA	26.89	1
2	1	3	OptionB	0.00	2
2	1	3	OptionC	73.11	3
3	2	3	OptionA	0.00	1
3	2	3	OptionB	11.92	2
3	2	3	OptionC	88.08	3

In a final step, we allocate the value of 1 only if the corresponding row value of variable ‘Option’ indicates the actually chosen/ predicted alternative and the value of 0 otherwise.

We do so, by updating the values of ‘Choice’ and ‘Predicted_Choice’ by new values according to the instructions in ifelse().

Here, we make use of a ifelse() function, which has the general form ifelse(test_expression, x, y).

test_expression is a logical condition that either is ‘TRUE’ or ‘FALSE’. If we use a dataframe as condition, then the evaluation is applied row-wise.
x is the instruction that is executed in case the logical condition is ‘TRUE’
y is, analogous, the instruction that is executed in case the logical condition is ‘FALSE’

In our case, we evaluate whether the actually chosen HOT product in ‘Choice’ is unequal (!= in R) to the corresponding value in our newly added variable ‘temp’. If this is the case, the value of ‘Choice’ is recoded to the value 0. If, however, the actual chosen alternative in ‘Choice’ is equal to ‘temp’, then the call recoded ‘choice’ to become a 1. We do the same for ‘Predicted_Choice’.

After this step the variable ‘temp’ is useless, so we remove it via d_stacked$temp <- NULL (a general approach to remove a variable from a dataframe).

d_stacked$Choice <- ifelse(d_stacked$Choice != d_stacked$temp, 0, 1)

d_stacked$Predicted_Choice <- ifelse(d_stacked$Predicted_Choice != d_stacked$temp, 0, 1)


d_stacked$temp <- NULL

Again, we can inspect the result:

View(d_stacked)

Brief Overview
ID	Choice	Predicted_Choice	Option	Predicted_Share
1	1	1	OptionA	98.20
1	0	0	OptionB	0.00
1	0	0	OptionC	1.80
2	1	0	OptionA	26.89
2	0	0	OptionB	0.00
2	0	1	OptionC	73.11
3	0	0	OptionA	0.00
3	1	0	OptionB	11.92
3	0	1	OptionC	88.08

4 Predictive validity according to different metrics

Next, we proceed with the calculation of the different metrics to evaluate the predictive validity of our conjoint analysis. We are searching for each metric we discussed during class:

The Hitrate: How many consumers did actually decide for the product that is predicted to be their most preferred product within the HOT (e.g., Ding, 2007; Dong, Ding, & Huber, 2010). It is calculated as the fraction of correctedly predicted consumers.
Mean Absolute Error (MAE) and Root-Mean-Squared-Error (RMSE) as indication of the the predictive performance at an aggregated level (Liu, Lee, & Conrad, 2015)
Cohen’s kappa (e.g., Cohen, 1968; Landis & Koch, 1977)
The mean hit probability: The mean predicted choice probability of the actually chosen option (e.g., Dotson et al., 2018; Voleti, Srinivasan, & Ghosh, 2017)

4.1 Hitrate

Calculating the hitrate is relatively straightforward. In an essence, we count (nrow()) row-wise the number of times in d_stacked in which both variables ‘Choice’ and ‘Predicted_Choice’ are 1 (the correctly predicted choices). Afterwards we divide by the total number of consumers in the dataset (nrow()), which gives the hitrate.

hitrate <- nrow(d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]) / nrow(d)

hitrate

## [1] 0.3333333

Interpretation: The hitrate of correct prediction is 33.3333333%. Since, the chance level of prediction is also $1/3, this results indicate an insufficient predictive validity.

We can furthermore test, if the observed hitrate is significantly greater than the chance level. The easiest way is to use an exact binomial test.

To do so, we can use the binom.test() function. Recap: You can evaluate how the function works and which arguments it expects by pressing ‘F1’ while setting the cursor into the function’s name.

binom.test(x, n, p = 0.5, alternative = c(“two.sided”, “less”, “greater”), conf.level = 0.95) expects 5 arguments:

x: the number of successes (the number of hits in our case)
n: the number of trails (the number of consumers in our conjoint study)
p: the assumed probability of success in each trail (the chance level of prediction)
alternative: a string indicating if we want to test undirected or directed for differences from chance level
conf.level: simply the accepted alpha error rate.

In the code chunk below, we use the code that we applied to the calculation of the hitrate to set x and n. The assumed probability of success p is equal to ‘(1 / (ncol(d) - 3))’, that is 1 divided by the number of columns in d minus 3 (the number of product alternatives in the HOT is given by the columns in d minus the ID, the actual choice, the predicted choice). We set the confidence level to 0.95 (alpha=.05). We test directed if the observed hitrate is greater than the assumed probability of success (alternative = “greater”).

undirected_binomial_test_chance_level <- binom.test(nrow(x = d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]),
  n = nrow(d),
  alternative = "greater",
  p = (1 / (ncol(d) - 3)),
  conf.level = .95
)

undirected_binomial_test_chance_level

## 
##  Exact binomial test
## 
## data:  nrow(x = d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]) and nrow(d)
## number of successes = 1, number of trials = 3, p-value = 0.7037
## alternative hypothesis: true probability of success is greater than 0.3333333
## 95 percent confidence interval:
##  0.01695243 1.00000000
## sample estimates:
## probability of success 
##              0.3333333

Interpretation: In our example the hitrate is far away from being significantly greater than the chance level. The actual p-value of the directed binomial test is 0.7037037.

The next code chunk includes an alternative test based on Bayes Factors. We will not discuss in detail during our seminar. For further information see the proportionBF() function provided by the BayesFactor package. In an essence, this Bayes procedure tests the null hypothesis that the probability of a successful prediction is p, the chance level of prediction in the code chunk below. Commonly it is accepted that the observed hitrate is equally to p, if the corresponding Bayes factor exceeds a value of 10 (Sir Jeffreys, 1961, p. 432). Thus, with a lower value we do not accept that the observed hitrate is only at chance level, which is good for us.

BF <- proportionBF(
  y = nrow(d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]),
  N = nrow(d),
  p = (1 / (ncol(d) - 3))
)

BF <- extractBF(BF)

BF$bf

## [1] 0.8226845

Interpretation: Also, the alternative test based on a Bayes factor is indicating that there is little support for the hitrate to be oly that high as the chance level. The actual Bayes Factor is 0.8226845.

4.2 Mean Absolute Error (MAE), Median Absolute Error (MedAE) and Root-Mean-Squared-Error (RMSE)

We have to apply several steps to calculate MAE, MedAE, and RMSE.

First we calculate the observed and the predicted choice shares for all options included in the HOT (3 in our case) For this purpose, we initialize two numeric vectors that are able to take up as much elements as we have product alternatives in the HOT ((ncol(d) - 3))

Afterwards, we use a for() loop to calculate the shares. This time, i starts at the value 1 and ends at the number of product alternatives in the HOT (3 in our example). Thus, at the first step (i=1) we set the first element of ‘observed_share’ to the number of rows in d where the variable ‘choice’ has the value of i (=1 in the first step) divided by the total number of consumers in the study (nrow(d)). Put differently, we count how often Alternative i=1 was actually chosen and divide the result by the number of consumers, which gives the empirical choice share of the first alternative. Then, in the second step, the loop calculates the share for the second alternative and so forth… We apply a similar procedure to calculate the predicted shares of each option. Here, in the first step we simply take the mean of the second column in d (which consists of the individual predicted choice shares for the first product alternative). Then, we proceed with the second product alternative and so forth…

observed_share <- numeric(ncol(d) - 3)

predicted_share <- numeric(ncol(d) - 3)


for (i in 1:(ncol(d) - 3)) {
  observed_share[i] <- nrow(d[d$Choice == i, ]) / (nrow(d))
  predicted_share[i] <- mean(d[, (i + 1)]) / 100
}

Here are the results:

observed_share

## [1] 0.6666667 0.3333333 0.0000000

predicted_share

## [1] 0.41696667 0.03973333 0.54330000

Think back to what we have learned during the lecture (chapter 5). We, now, can calculate the absolute (i.e., the MAE) and and squared errors (i.e., RMSE) between observed and predicted shares. First, we create a new dataframe named ‘Error_Matrix’ which just has two columns “observed_share”, “predicted_share”. Then, we calculate the absolute error between observed and predicted share by applying the abs() function. Third, we calculate the squared error. Finally, we take the mean of the absolute errors, which gives the MAE, the median of the absolute errors, which gives the MedAE, and calculate the square root (sqrt()) of the mean squared errors (RMSE).

Error_Matrix <- as.data.frame(cbind(observed_share, predicted_share))

names(Error_Matrix) <- c("observed_share", "predicted_share")


Error_Matrix$Absolute_Error <- abs(Error_Matrix$observed_share - Error_Matrix$predicted_share)
Error_Matrix$Squared_Error <- (Error_Matrix$observed_share - Error_Matrix$predicted_share) * (Error_Matrix$observed_share - Error_Matrix$predicted_share)


MAE <- mean(Error_Matrix$Absolute_Error)

MedAE <- median(Error_Matrix$Absolute_Error)

RMSE <- sqrt(mean(Error_Matrix$Squared_Error))

MAE

## [1] 0.3622

MedAE

## [1] 0.2936

RMSE

## [1] 0.3845889

Interpretation: Based on the MAE (RMSE), we can expect a mean error in predicting choice shares of product alternatives of 36.22% / 38.459%, which is enormously. The MedAE 29.36is somewhat lower than the MAE. Thus, we can conclude that the MAE is driven by some exceptional bad predicted options.

To further gain insight in the predictions, we can create a simple bar chart visualizing the observed vs. the predicted choice shares for each option in the HOT. To do so, we firstly have to rearrange the data into a ‘longer’ format. We want to stack the created Error_Matrix.

In the code chunk below, we first establish a new object called ‘Error_Matrix_long’. Next, we use the %>% operator (click here for further explanation) to feed-in the first two columns of our Matrix ‘d’Error_Matrix’ as an argument into a function. The function in this case is mutate(), which helps us to create a new variable named choice_option which collects the original names of the choice options provided by our excel file in the beginning (dataframe d). The results of this step are then moved forward to the function pivot_longer() originating from the tidyr package. We hand-in several arguments to the function:

‘names_to = “Predicted_vs_Observed”’ tells the function to create a new variable in the stacked dataframe that will contain the columns observed_share and predicted_share.
‘values_to = “Shares”’ tell the function that we need a second variable that will contain the choice shares corresponding to each option.
‘cols = 1:2’ returns a range of variables whose values will be stored in ‘Predicted_vs_Observed’.

Error_Matrix_long <- Error_Matrix[,1:2] %>%
  mutate(choice_option = names(d)[2:(ncol(d)-2)]) %>%
  pivot_longer(
    cols = 1:2,
    names_to = "Predicted_vs_Observed",
    values_to = "Shares"
  )

We can briefly review the results:

Again, we can inspect the result:

View(Error_Matrix_long)

Brief Overview
choice_option	Predicted_vs_Observed	Shares
OptionA	observed_share	0.6666667
OptionA	predicted_share	0.4169667
OptionB	observed_share	0.3333333
OptionB	predicted_share	0.0397333
OptionC	observed_share	0.0000000
OptionC	predicted_share	0.5433000

Next, we can produce a simple bar chart which visualizes the shares. In the code chunk below, the first two lines create such a bar chart. Line three is adding horizontal line at y=0. Line 4 applies a special theme to the plot. Line 5 adds a label for the y-axis. All arguments listed within the theme() are just polishing some layout options. Finally, the arguments within the geom_text() call add rounded value labels to the plot. The last line defines fixed limits for the y-axis.

ggplot(Error_Matrix_long, aes(fill = Predicted_vs_Observed, y = Shares, x = choice_option)) +
  geom_bar(width = 0.7, position = position_dodge(width = 0.8), stat = "identity") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  ylab("Observed and prediected choice shares of options in HOT") +
  theme(
    legend.title = element_text(size = 13),
    legend.text = element_text(size = 12),
    axis.text.y = element_text(size = 12),
    axis.text.x = element_text(size = 12),
    axis.line = element_blank(),
    axis.title.y = element_text(size = 11),
    axis.title.x = element_blank(),
    legend.position = "bottom",
  ) +
  geom_text(
    aes(label = format(round(Shares, 3), nsmall = 3)),
    position = position_dodge(width = 0.8),
    size = 4.5,
    vjust = -0.48
  ) +
  scale_y_continuous(limits = c(0, 1))

4.3 Cohen’s kappa

Next we use the function cohen.kappa() from the psych package to evaluate agreement between predicted and actually chosen products in the HOT.

cohen.kappa(x, alpha) expects 2 arguments:

x: a dataframe with two columns representing the observed and the predicted choices
alpha: the accepted alpha error rate for inferential statistics

We set x to ‘select(d, Choice, Predicted_Choice)’ in order to select the 2 variables ‘Choice’ and ‘Predicted_Choice’ from the dataframe d. Furthermore, we set alpha to 0.05.

This produces:

kappa <- cohen.kappa(x = select(d, Choice, Predicted_Choice), alpha = .05)


kappa$confid[1, ]

unweighted Cohen’s kappa, with 95% confidence intervals
	Estimate
lower	-0.1163677
estimate	0.1428571
upper	0.4020820

Interpretation: As discussed during class, Cohen’s kappa is exactly 1/7, as the conjoint model predicts not much better than guessing. Furthermore, the confidence interval clearly includes the value 0, meaning that the agreement between observed and predicted choices is not significantly better than those obtained through guessing.

Alternatively, we can obtain a z-test for Cohen’s kappa, which tests whether kappa is significantly different from 0. For this purpose, we rely on the function kappa2() from the irr package.

kappa2() expects only 1 mandatory argument:

ratings: a 2-dimensional dataframe (or matrix) with two columns representing the observed and the predicted choices

Again, we set ratings to ‘select(d, Choice, Predicted_Choice)’ in order to select the 2 variables ‘Choice’ and ‘Predicted_Choice’ from the dataframe d. In the code below, we assign the results to a new object named kappa2. Afterwards, we call the rounded values of the corresponding test statistic and the p-value.

kappa2 <- kappa2(ratings = select(d, Choice, Predicted_Choice))


round(kappa2$statistic, digits = 4)
round(kappa2$p.value, digits = 4)

z-value: 0.866
p-value: 0.3865

Interpretation: Also according to the z-test the agreement between observed and predicted choices is not significantly different from 0 (p=0.3865).

4.4 Mean hit probability (MHP)

Calculating the MHP is straightforward. In ‘d_stacked’, we simply take the mean of the ‘Predicted_Share’ variable for those rows where ‘Choice’ equals 1.

MHP <- mean(d_stacked[d_stacked$Choice == 1, ]$Predicted_Share)

MHP

## [1] 45.67

Interpretation: The Mean hit probability is 45.67%. This means that, on average, the conjoint model assigns a choice likelihood of only 45.67% to the HOT-alternatives the consumers actually decide for.

5 Introducing a convenient function to validate HOTs

Now, I will show you a convenient way to automate all the above steps in a single R function.

Remember, R is a functional programming language, meaning we can define a function that takes some arguments as input and delivers a result as an output.

We have to teach R that function.
Afterwards, we can apply (‘call’) the function.

In the code chunk below, I have summarized each of the steps we discussed during this unit in a single function() named ‘Validate_a_single_HOT’. The function only expects one argument (a dataset, such as our dataframe d) and returns a list of results containing the HOT-metrics discussed before. MAKE SURE that the variable indicating the actual choice of the consumers is named ‘Choice’!

Please, execute the whole code chunk to teach R the function.

Validate_a_single_HOT <- function(dataset) {
  if (!"pacman" %in% rownames(installed.packages())) install.packages("pacman")

  library(pacman)

  pacman::p_load(tidyverse, tidyr, dplyr, tidyselect, psych, irr, readxl, BayesFactor)




  d <- as.data.frame(dataset)

  d$Predicted_Choice <- 0

  for (i in 1:nrow(d)) {
    for (k in 1:(ncol(d) - 3)) {
      if (d[i, (k + 1)] == max(d[i, 2:(ncol(d) - 2)])) {
        d$Predicted_Choice[i] <- k
      }
    }
  }


  d_stacked <- d %>%
    pivot_longer(cols = names(d)[2]:names(d)[(ncol(d) - 2)], names_to = "Option", values_to = "Predicted_Share")

  d_stacked$temp <- rep(1:(ncol(d) - 3), nrow(d))
  d_stacked$Choice <- ifelse(d_stacked$Choice != d_stacked$temp, 0, 1)
  d_stacked$Predicted_Choice <- ifelse(d_stacked$Predicted_Choice != d_stacked$temp, 0, 1)
  d_stacked$temp <- NULL

  ## hintrate

  hitrate <- nrow(d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]) / nrow(d)

  ## binomial test for hitrate

  undirected_binomial_test_chance_level <- binom.test(nrow(d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]),
    nrow(d),
    alternative = "greater",
    p = (1 / (ncol(d) - 3)),
    conf.level = .95
  )

  ## Bayes Factor for hitrate against chance level

  BF <- proportionBF(
    nrow(d_stacked[d_stacked$Choice == 1 & d_stacked$Predicted_Choice == 1, ]),
    nrow(d),
    (1 / (ncol(d) - 3))
  )

  BF <- extractBF(BF)


  ## MAE, MedAE, and RMSE


  observed_share <- numeric(ncol(d) - 3)
  predicted_share <- numeric(ncol(d) - 3)


  for (i in 1:(ncol(d) - 3)) {
    observed_share[i] <- nrow(d[d$Choice == i, ]) / (nrow(d))
    predicted_share[i] <- mean(d[, (i + 1)]) / 100
  }




  Error_Matrix <- as.data.frame(cbind(observed_share, predicted_share))
  names(Error_Matrix) <- c("observed_share", "predicted_share")
  Error_Matrix$Absolute_Error <- abs(Error_Matrix$observed_share - Error_Matrix$predicted_share)
  Error_Matrix$Squared_Error <- (Error_Matrix$observed_share - Error_Matrix$predicted_share) * (Error_Matrix$observed_share - Error_Matrix$predicted_share)

  MAE <- mean(Error_Matrix$Absolute_Error)

  MedAE <- median(Error_Matrix$Absolute_Error)

  RMSE <- sqrt(mean(Error_Matrix$Squared_Error))

  ## Visualization
  
  
Error_Matrix_long <- Error_Matrix[,1:2] %>%
  mutate(choice_option = names(d)[2:(ncol(d)-2)]) %>%
  pivot_longer(
    cols = 1:2,
    names_to = "Predicted_vs_Observed",
    values_to = "Shares"
  ) 


Share_plot <- ggplot(Error_Matrix_long, aes(fill = Predicted_vs_Observed, y = Shares, x = choice_option)) +
  geom_bar(width = 0.7, position = position_dodge(width = 0.8), stat = "identity") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  ylab("Observed and prediected choice shares of options in HOT") +
  theme(
    legend.title = element_text(size = 13),
    legend.text = element_text(size = 12),
    axis.text.y = element_text(size = 12),
    axis.text.x = element_text(size = 12),
    axis.line = element_blank(),
    axis.title.y = element_text(size = 11),
    axis.title.x = element_blank(),
    legend.position = "bottom",
  ) +
  geom_text(
    aes(label = format(round(Shares, 3), nsmall = 3)),
    position = position_dodge(width = 0.8),
    size = 4.5,
    vjust = -0.48
  ) +
  scale_y_continuous(limits = c(0, 1))
  

  ## Cohen's kappa

  kappa <- cohen.kappa(x = select(d, Choice, Predicted_Choice), alpha = .05)

  kappa2 <- kappa2(ratings = select(d, Choice, Predicted_Choice))

  ## Mean hit probability

  MHP <- mean(d_stacked[d_stacked$Choice == 1, ]$Predicted_Share)

  Single_HOT_Validation_Results <- list(
    Alternatives_In_HOT = length(predicted_share),
    Chance_Level_correct_Prediction_Percent = 1 / length(predicted_share) * 100,
    Hitrate = hitrate * 100,
    binomial_test_Hitrate_greater_chance_p_Value = undirected_binomial_test_chance_level$p.value,
    Bayes_Factor_hitrate_unequal_chance = BF$bf,
    MAE = MAE * 100,
    MedAE = MedAE * 100,
    RMSE = RMSE * 100,
    Plot = Share_plot,
    Cohens_Kappa_with_CIs = kappa$confid[1, ],
    z_test_Cohens_Kappa_test_statistic = kappa2$statistic,
    z_test_Cohens_Kappa_p_value = kappa2$p.value,
    MHP = MHP
  )

  return(Single_HOT_Validation_Results)
}

Afterwards, we can use the function by calling Validate_a_single_HOT(d) and assigning the results to a new object ‘HOT_results’. Just calling ‘HOT_results’, presents the results of the Holdout task validation.

HOT_results <- Validate_a_single_HOT(d)

HOT_results

## $Alternatives_In_HOT
## [1] 3
## 
## $Chance_Level_correct_Prediction_Percent
## [1] 33.33333
## 
## $Hitrate
## [1] 33.33333
## 
## $binomial_test_Hitrate_greater_chance_p_Value
## [1] 0.7037037
## 
## $Bayes_Factor_hitrate_unequal_chance
## [1] 0.8226845
## 
## $MAE
## [1] 36.22
## 
## $MedAE
## [1] 29.36
## 
## $RMSE
## [1] 38.45889
## 
## $Plot

## 
## $Cohens_Kappa_with_CIs
##      lower   estimate      upper 
## -0.1163677  0.1428571  0.4020820 
## 
## $z_test_Cohens_Kappa_test_statistic
## [1] 0.8660254
## 
## $z_test_Cohens_Kappa_p_value
## [1] 0.3864762
## 
## $MHP
## [1] 45.67

Finally, we can apply our established function to some real conjoint analysis results.

Make sure that the file ‘Electric_Toothbrush_HOT.xlsx’ is located in your RStudio project folder. It contains the predicted choice shares as well as the actual HOT choices of an ACBC industry study on electric toothbrushes. This time, the corresponding HOT includes 8 product alternatives.

Let us first import the data to a new dataframe d2.

d2 <- as.data.frame(read_excel("Electric_Toothbrush_HOT.xlsx", sheet = 1))

Let’s inspect the first few entries in resulting dataframe:

View(head(d2[, 1:5]))

Brief Overview
ID	A	B	C	D
1	0.0000002	0.0004164	0.0001510	0.0068413
2	0.0052724	4.5679293	20.0590911	52.4080830
4	0.0002115	0.0070856	4.4044042	1.8456768
5	0.0222073	0.7654935	57.1190982	31.6780665
6	0.0000008	0.0021444	0.0000002	0.0002178
7	0.0000431	0.0006004	5.0262212	8.9642278

Finally, we apply the function Validate_a_single_HOT() to d2

HOT_results <- Validate_a_single_HOT(d2)

HOT_results

## $Alternatives_In_HOT
## [1] 8
## 
## $Chance_Level_correct_Prediction_Percent
## [1] 12.5
## 
## $Hitrate
## [1] 32.01754
## 
## $binomial_test_Hitrate_greater_chance_p_Value
## [1] 1.191898e-14
## 
## $Bayes_Factor_hitrate_unequal_chance
## [1] 381127713120
## 
## $MAE
## [1] 7.642749
## 
## $MedAE
## [1] 3.660551
## 
## $RMSE
## [1] 10.83374
## 
## $Plot

## 
## $Cohens_Kappa_with_CIs
##      lower   estimate      upper 
## 0.05125004 0.11892296 0.18659588 
## 
## $z_test_Cohens_Kappa_test_statistic
## [1] 3.871297
## 
## $z_test_Cohens_Kappa_p_value
## [1] 0.0001082576
## 
## $MHP
## [1] 30.4734

List of References

Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. doi: \url{10.1037/h0026256}

Ding, M. (2007). An incentive-aligned mechanism for conjoint analysis. Journal of Marketing Research, 44(2), 214–223. doi: \url{10.1509/jmkr.44.2.214}

Dong, S., Ding, M., & Huber, J. (2010). A simple mechanism to incentive-align conjoint experiments. International Journal of Research in Marketing, 27(1), 25–32. doi: \url{10.1016/j.ijresmar.2009.09.004}

Dotson, J. P., Howell, J. R., Brazell, J. D., Otter, T., Lenk, P. J., MacEachern, S., & Allenby, G. M. (2018). A probit model with structured covariance for similarity effects and source of volume calculations. Journal of Marketing Research, 55(1), 35–47. doi: \url{10.1509/jmr.13.0240}

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. doi: \url{10.2307/2529310}

Liu, M., Lee, S., & Conrad, F. G. (2015). Comparing extreme response styles between agree-disagree and item-specific scales. Public Opinion Quarterly, 79(4), 952–975. doi: \url{10.1093/poq/nfv034}

Sir Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford: Oxford University Press.

Voleti, S., Srinivasan, V., & Ghosh, P. (2017). An approach to improve the predictive power of choice-based conjoint analysis. International Journal of Research in Marketing, 34(2), 325–335. doi: \url{10.1016/j.ijresmar.2016.08.007}

OCR Sawtooth ACBC

Chapter 5: Holdout_Task_Validation

Asst. Prof. Dr. Marcel Lichters

2020-11-18

1 Load packages that will be used throughout the analysis

2 Import the *.xlsx file from Excel

3 Data transformations

4 Predictive validity according to different metrics

4.1 Hitrate

4.2 Mean Absolute Error (MAE), Median Absolute Error (MedAE) and Root-Mean-Squared-Error (RMSE)

4.3 Cohen’s kappa

4.4 Mean hit probability (MHP)

5 Introducing a convenient function to validate HOTs

List of References