Follow me on ResearchGate

Connect with me on Open Science Framework | Contact me via LinkedIn


R analysis script for analyzing Alpha (distribution of attribute level-wise means of the participants’ utilities) across successive iteration steps. Estimation uses the PlayStation 4 bundle example (see Chapter 4 in lecture). Estimation was done in Sawtooth Software Lighthouse Studio v.9.8

1 Load packages that will be used throughout the analysis

Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.

In R most functionality is provided by additional packages.
Most of the packages are well documented, See: https://cran.r-project.org/

The code chunk below first evaluates if the package pacman is already installed to your machine. If yes, the corresponding package will be loaded. If not, R will install the package.

Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).

The second line then loads the package pacman

The third line uses the function p_load() from the pacman package to install (if necessary) and load all packages that we provide as arguments. We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name.

if(!"pacman" %in% rownames(installed.packages())) install.packages("pacman")

library(pacman)


pacman::p_load(tidyverse, haven, labelled, knitr, bayestestR, bayesplot) 

In case you are not able to load a package, try to first install it via for example ‘install.packages(foreign)’ whereby foreign is the name of the package.

Here is the r session info:

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bayesplot_1.7.2  bayestestR_0.7.2 knitr_1.29       labelled_2.5.0  
##  [5] haven_2.3.1      forcats_0.5.0    stringr_1.4.0    dplyr_1.0.1     
##  [9] purrr_0.3.4      readr_1.3.1      tidyr_1.1.2      tibble_3.0.3    
## [13] ggplot2_3.3.2    tidyverse_1.3.0  pacman_0.5.1    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0 xfun_0.16        colorspace_1.4-1 vctrs_0.3.2     
##  [5] generics_0.0.2   htmltools_0.5.0  yaml_2.2.1       blob_1.2.1      
##  [9] rlang_0.4.7      pillar_1.4.6     glue_1.4.1       withr_2.2.0     
## [13] DBI_1.1.0        dbplyr_1.4.4     modelr_0.1.8     readxl_1.3.1    
## [17] plyr_1.8.6       lifecycle_0.2.0  munsell_0.5.0    gtable_0.3.0    
## [21] cellranger_1.1.0 rvest_0.3.6      evaluate_0.14    fansi_0.4.1     
## [25] broom_0.7.0.9001 Rcpp_1.0.5       scales_1.1.1     backports_1.1.8 
## [29] jsonlite_1.7.1   fs_1.5.0         hms_0.5.3        digest_0.6.25   
## [33] stringi_1.4.6    insight_0.9.0    grid_4.0.2       cli_2.0.2       
## [37] tools_4.0.2      magrittr_1.5     crayon_1.3.4     pkgconfig_2.0.3 
## [41] ellipsis_0.3.1   xml2_1.3.2       ggridges_0.5.2   reprex_0.3.0    
## [45] lubridate_1.7.9  assertthat_0.2.1 rmarkdown_2.3    httr_1.4.2      
## [49] rstudioapi_0.11  R6_2.4.1         compiler_4.0.2

2 Import the *.csv file as produced by Sawtooth Software Lighthouse Studio.

Make sure that the file ‘Playstation4_hb_alpha.csv’ (see E-Learning platform) is located at your RStudio project’s directory.

Save the file into a new dataframe called d

R is a functional language, meaning we could define a function that takes some arguments as input and delivers a result as an output.

We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name (i.e., read.csv() below).

In the code chunk below, we hand-in the file name as well as the arguments header, sep, and dec to the read.csv() function.

In this way we tell the function that:

  • our data file uses a header containing variable names in the first row
  • data is separated by a ‘,’
  • the file uses ‘.’ as decimal sign

To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’.
If you encounter any problems with the read functions, please try to copy your csv-file to the desktop and then use:

d<-read.csv(file.choose(), header = TRUE, sep = ’,’, dec = ’.’)’.
This will open a dialogue that allows for a manual selection of the file.

d<-read.csv("PlayStation4_hb_alpha.csv", header = TRUE, sep = ',', dec = '.')

Now, the object d (a dataframe) became part of our working environment. We can access the dataframe.

For Example we can view the dataframe d via the function View(). We can also preview the first part of the data set by head()

The code below returns the first six cases (head()) of d, while only considering the first 5 variables (d[,1:5]).

head(d[,1:5])
Brief Overview
ï..Iteration X500.gigabyte X1.terabyte black white
1 -0.0654039 0.0654039 -0.0017922 0.0017922
2 -0.0100344 0.0100344 -0.0024558 0.0024558
3 -0.0346806 0.0346806 0.0019213 -0.0019213
4 0.0197559 -0.0197559 -0.0212265 0.0212265
5 -0.0092769 0.0092769 0.0257134 -0.0257134
6 -0.0850587 0.0850587 -0.0267374 0.0267374

We see that each row corresponds to one step of iteration. Keep in mind: We assume the Hastings-Algorithm to have converged from the step 40001 on.

Next, allocate appropriate variable names. The following code chunk sets the names of d (the variable names) to a vector of names we provide as strings. c() is a generic R function which combines its arguments to form a vector.

names(d) <- c("Iteration", "500_gigabyte", "1_terabyte", "black", "white", "OneDualShockController", "TwoDualShockController", "charging_station_for_2_PS4_controller", "PS4_wireless_Stereo_Headset", "no_accessories", "Far_Cry_Primal", "GTA_V", "Life_is_strange", "Tom_Clancy_The_Division", "no_action_adventure_game", "The_Witcher_3", "Fallout_4", "Final Fantasy_X_X2", "Dark_Souls_3", "no_role_playing_game", "Just_Dance_2016", "Guitar_Hero_Live_incl_Guitar", "FIFA_16", "no_game_for_family_companionship", "price_202.3", "price_775.489", "NONE")

Afterwards we could see the new names of d by calling the names() function.
The call head(names(d)) presents the first few names in the dataframe

head(names(d))
## [1] "Iteration"              "500_gigabyte"           "1_terabyte"            
## [4] "black"                  "white"                  "OneDualShockController"

In R we can assess the elements of a dataframe in multiple ways.

For example we can assess the variable “500_gigabyte” by calling d$500_gigabyte.
Alternatively, as we know that the variable is the second one in the dataframe, we could also use d[,2].
Analogous, we can use d[2,] to return all variable values of the second case (row) in the dataframe d.
In addition, we can use d[2,2] to return the value of the second variable for the second case.
Finally, we can use d[1:4, 2] to return the values of the second variable for the first four cases (rows) in d.


3 Example 1: Highest density intervals (HDIs) for part worth utilities

You might have assumed that Bayesian tests are going to be much harder to understand than classical null hypothesis tests. This has not to be true!

Statistical testing for ACBC HB models requires a shift in mindset.
We examine the distribution of posterior Alpha draws to see if a strong majority of these draws falls on either one side or the other of the null hypothesis (Kruschke, 2015; Orme & Chrzan, 2017).

One thing market researchers are very familiar with are 95% confidence intervals of estimated parameters in regression models. Bayesian analysis of the posterior distribution of Alphas offers a similar interval, Bayesians commonly refer to as Highest-Density-Intervals (HDI).
The HDI indicates which points of the distribution are most credible, and which cover most of the distribution. Thus, the HDI summarizes the distribution by specifying an interval that spans most of the distribution, say 95% of it, such that every point inside the interval has higher credibility than any point outside the interval. Therefore, also the term credibility interval is very common.

To develop a 95% HDI, we simply sort the 80,000 draws following the Burn-In (warm-up) stage from lowest to highest and then pick out the lowest and highest 2.5% from each tail. Put differently, we search for the 2.5% and the 97.5% quantiles. That’s all!

Let’s show how to do this in R for the part worth utility of 1 Dualshock Controller included in the PS4 bundle.


3.1 Throw away the first 40,000 iterations of the Burn-In (warm-up) stage

The following code chunk used the function subset() with the arguments ‘d’ and ‘Iteration>40000’.
the subset() function selects the subset of cases in d for which the variable ‘Iteration’ has values higher than 40,000.
The resulting subset of data is then replacing the old dataframe d.

d <- subset(d, Iteration>40000)

3.2 Extract the 2.5% and the 97.5% quantiles as limits of the 95% HDI

Below, we call the hdi() function of the package bayestestR. We feed the function with 2 arguments:

  • The variable of interest (d$OneDualShockController)
  • The confidence level (0.95).

in the code below, we explicitly call bayestestR::hdi() instead of hdi(). This is because other packages in the R universe also provides a function hdi() producing a different output (e.g., bayestestR). By calling bayestestR::hdi(), we make sure to apply the correct hdi() function.

bayestestR::hdi(d$OneDualShockController, 0.95)
## # Highest Density Interval
## 
## 95% HDI       
## --------------
## [-1.23, -0.66]

Interpretation: The lower limit of the 95% HDI is -1.2300384. The upper limit is -0.6591412. As the 95% HDI does not include 0, we can conclude that, at a confidence level of 95%, having only 1 instead of 2 Dualshock Controller with the PS4 bundle significantly decreases the likelihood of being chosen1.


3.3 Visualization of the entire a postertiori distribution after the warm-up stage (iteration step >40,000)

R is especially strong in visualizing data. We can generate a plot that shows us the entire a posteriori distribution of Alpha values (if you not believe that HB draws in the upper model from a normal distribution), together with the limits of the 95% HDI.

A description of the syntax is given below.

Density_Plot <- ggplot(d, aes(x = d$OneDualShockController)) +
  geom_density(alpha = .2, fill = "#31B404" ) +
  labs(title = "A posteriori density of Alpha estimates") +
  labs(x = "Part worth utilities", y = "Density") +
  geom_vline(aes(xintercept = bayestestR::hdi(d$OneDualShockController, 0.95)$CI_low),
    color = "red", linetype = "dashed", size = 1) +
  geom_vline(aes(xintercept = bayestestR::hdi(d$OneDualShockController, 0.95)$CI_high),
    color = "red", linetype = "dashed", size = 1) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$OneDualShockController, 0.95)$CI_low) + 0.1, y = max(density(d$OneDualShockController)$y), label = round(bayestestR::hdi(d$OneDualShockController, 0.95)$CI_low, digits = 3)) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$OneDualShockController, 0.95)$CI_high) + 0.1, y = max(density(d$OneDualShockController)$y), label = round(bayestestR::hdi(d$OneDualShockController, 0.95)$CI_high, digits = 3))


plot(Density_Plot)

Explanation of the syntax creating the above plot: We use the package ggpolot2, to create a density plot of the a posteriori Alpha draws. The first line uses the function ggplot() to set up a new plot named Density_Plot. We feed the function with the dataframe d and tell it furthermore to use the variable ‘d$OneDualShockController’ at the x-axes.
Line 2 adds the density line to the plot. The argument alpha=.2 and fill=“#FF6666” define a filling for the area under the curve that is semi-transparent (fill uses hexadecimal code for the color).
Lines 3 and 4 add a plot title as well as labels describing both axes.
Line 5 and 6 add a vertical line at the lower limit of the 95% HDI, which is red, dashed and of size 1.
Line 7 and 8 add an analogous line at the upper limit.
In the rest of the code, we use the annotate function to add text labels with the rounded values of the limits to the plot. The position of these text labels is govern by x and y coordinates. We set x to the limits of the 95% HDI and add some shift to the right (+0.1), whereas we set y to the maximum of the density function in the diagram.

Whereas in the example above, we have used the ggplot() package to create a density plot of the posterior distribution from the scratch, we next use the bayesplot package to make things easier.

First, we use the function bayesplot::mcmc_areas() to recreate the above density plot. Here, the function only uses 3 arguments:

  • the dataframe providing the posterior draws (d in our case)
  • pars, this argument specifies which variables (columns of d) to use in the plot. We set pars = vars(OneDualShockController) to include the draws for the attribute level One Dual-Shock Controller
  • prob, this argument defines the probability limits of the credibility interval to draw. We want a 95% HDI.
mcmc_areas(d, pars = vars(OneDualShockController), prob = 0.95)

Interpretation: The bold blue vertical line constitutes the median of the distribution. The limits of the light-blue area represent the 95% HDI of the posterior draws. Similar to the approaches above, the 95% HDI does not include 0. We can conclude that, at a confidence level of 95%, having only 1 instead of 2 Dualshock Controller with the PS4 bundle significantly decreases the likelihood of being chosen.

The bayesplot package also provides a function to create density plots for multiple attribute levels simultaneously, namely mcmc_areas_ridges(). In the code chunk below, we call mcmc_areas_ridges() with values for 3 of its arguments:

  • the dataframe providing the posterior draws (d in our case)
  • pars, this argument specifies which variables (columns of d) to use in the plot. We set pars = vars(-Iteration, -NONE, -price_202.3, -price_775.489) to use all columns of d except for variables named ‘Iteration’, ‘NONE’, ‘price_202.3’, and ‘price_775.489’. For this example, we deliberately exclude the draws for the none-parameter and the two variables representing the linear price function. These draws have so extreme values that the readability of the graph for other variables would otherwise have been terrible.
  • prob, this argument defines the probability limits of the credibility interval to draw. We want a 95% HDI.

Line 2-5 of the code below just add a title and a subtitle to the plot.

mcmc_areas_ridges(d, pars = vars(-Iteration, -NONE, -price_202.3, -price_775.489), prob = 0.95) +
  ggplot2::labs(
    title = "Posterior distributions",
    subtitle = "with 95% HDI intervals"
  )

Interpretation: The plot very much mirrors information seen before, but for multiple attribute levels at once. We can see, for example, that on average the color black is significantly increasing the choice likelihood of a PlayStation 4 bundle as compared to the color white. Conversely, differences in hard disk capacity (500GB vs. 1TB) are on average not significant, since the HDI’s for the corresponding attribute levels are both crossing 0. Within the category of action-adventures on average (5 levels: Far Cry Primal, GTA V, Life is Strange, Tom Clancy’s The Devision, and no action-adventure), GTA V spends significantly more utility as compared to the average utility of the attribute’s levels. Furthermore, the plot is very useful to highlight differences in uncertainty of point estimates for different attribute levels. By comparing the breadth of the density plots of different attribute levels, we have to acknowledge that point estimates for levels with a thin density plot are of a higher precision as compared to those point estimates with an underlying density of greater width. Thus, we can trust more in the point estimates for the color white as compared to the point estimates for Just Dance 2016. Although both attribute levels are decreasing choice likelihood, the estimates for a white color of the console are affected by less uncertainty. Uncertainty in this context can also indicate that different consumer segments exist in the sample, which do hold different levels of negative utility for Just Dance 2016.

Finally, the bayesplot package also provides a function to solely illustrate HDI’s of multiple attribute levels, namely mcmc_intervals(). In the code chunk below, we call mcmc_intervals() using 4 arguments:

  • the dataframe providing the posterior draws (d in our case)
  • pars, this argument specifies which variables (columns of d) to use in the plot. We set pars = vars(-Iteration, -NONE, -price_202.3, -price_775.489) to use all columns of d except for variables named ‘Iteration’, ‘NONE’, ‘price_202.3’, and ‘price_775.489’. For this example, we deliberately exclude the draws for the none-parameter and the two variables representing the linear price function. These draws have so extreme values that the readability of the graph for other variables would otherwise have been terrible.
  • prob, this argument defines the probability limits of an inner credibility interval to draw. We want a 89% HDI, as often recommended in literature (Kruschke, 2015).
  • prob_outer, the probability limits of an outer credibility interval to draw. We opt for 95%.
mcmc_intervals(d, pars = vars(-Iteration, -NONE, -price_202.3, -price_775.489), prob_outer = 0.95, prob = 0.89)

Interpretation: We can obtain the same information as before, without information on the form of distribution of attribute levels.


4 Example 2: Confidence that a certain part worth utility falls below a threshold

Next, we can evaluate how many percent of the Alpha draws for a part worth utility actually falls below the value of 0.
This time, we will use the part worth utility of the attribute level ‘Wireless Stereo Headset’ for the attribute ‘Accessories’ describing the PS4 bundles. This attribute also has two other levels: ‘Charging Station’ and ‘no accessories’. Thus, evaluating whether we can be very sure that the Alphas for ‘Wireless Stereo Headset’ fall below 0 means evaluating whether we can be sure that this accessory spends less utility to consumers as compared to the average utility any considered level of the attribute is doing (utilities within an attribute are zero-centered).

In the code chunk below use the function nrow() to calculate the fraction of posteriori draws that fall below 0. Here, nrow(d[d$PS4_wireless_Stereo_Headset<0, ]) counts the number of iterations a Alpha draw is less than 0. Dividing by nrow(d), which gives the total number of iterations in d, we obtain the fraction of draws falling below 0.

nrow(d[d$PS4_wireless_Stereo_Headset < 0, ]) / nrow(d) * 100
## [1] 98.1325

Interpretation: We see that 98.1325% of the draws are lower than 0. Thus, we can be very sure that adding a Wireless Stereo Headset to the PS4 bundle is actually decreasing total utility of the bundles as compared to the other options for the attribute (‘Charging Station’ and ‘no accessories’). The Bayesian p-value would correspond to 0.018675, which one might call a significant result. Keep in mind: if we test if a certain attribute level’s utility is < 0, we are testing whether this level’s utility is lower than the average utility across all attribute levels of the corresponding attribute (utilities are mean-centered within an attribute).


5 Example 3: Test whether one part worth utility is different from another part worth utility within the same attribute

We can apply the same logic in order to evaluate whether, on average, the utility of one attribute level A is significantly lower as compared to another level B within the same attribute (Keep in mind: We ca not compare magnitude of raw Alpha draws across different attributes).

To do this, we just count the fraction of iterations after the Warm-up stage for which the Alpha draw of attribute level A is lower than those for attribute level B.

In this illustration we focus on the attribute ‘Action-Adventure’. We assess if the game ‘Tom Clancy’s The Division’ spends significantly less utility as compared to the game ‘Far Cry Primal’.

nrow(d[d$Tom_Clancy_The_Division < d$Far_Cry_Primal, ]) / nrow(d) * 100
## [1] 59.08625

Interpretation: Only in slightly more than one half of iterations the Alpha draws for ‘Tom Clancy’s The Division’ fall below those obtained for ‘Far Cry Primal’. The Bayesian p-value corresponds to 0.409. Thus, the former game can not be assumed to be significantly less preferred as compared to the latter.


6 Example 4: Test whether two product concepts are significantly different liked by consumers

Think back to our lecture. Just as we were able to use the individual Beta draws for each consumer to calculate the total utility of a product bundle for that consumers, we can now use the Alpha draws to calculate the total utility for an average consumer.

Again, simply counting the number of iterations after Burn-In for which the total utility of Bundle A is higher as the total utility of bundle B allows for an assessment of significant differences from a Bayesian perspective.

Imagine, we want to evaluate if A, on average is significantly more liked than B

6.1 Evaluating total utilities

  • A: 500GB, black, 1 Controller, charging station, no Action-Adventure, no Role-Playing, no Family & Companionship, at a price of EUR 299.99
  • B: 1TB, 1 Controller, no Accessories, Far Cry Primal, Final Fantasy X/X2, no Family & Companionship at a price of EUR 349.99

The first thing we do is to calculate the total utilities across each attribute except for the price in each iteration step after the Warm-up stage. Therefore: we simply add the Alphas of the corresponding attribute levels, as shown below:

d$Util_A_without_Price <- d$`500_gigabyte` + d$black + d$OneDualShockController + d$charging_station_for_2_PS4_controller + d$no_action_adventure_game + d$no_role_playing_game + d$no_game_for_family_companionship

d$Util_B_without_Price <- d$`1_terabyte` + d$white + d$OneDualShockController + d$no_accessories + d$Far_Cry_Primal + d$`Final Fantasy_X_X2` + d$no_game_for_family_companionship

We can now preview the first few values of the new variable:

head(d$Util_A_without_Price)
## [1] -0.103480282 -0.237708920 -0.189473538 -0.613720912  0.003270605
## [6] -0.435947492
head(d$Util_B_without_Price)
## [1] -1.611101 -1.166206 -1.702064 -2.133918 -2.246294 -1.847212

For the utility of the price of the bundle, we have to apply a linear interpolation (see slides of Chapter 2).

We use the approx() function to interpolate linearly. We pass-in 3 arguments to the function:

  • x, which takes the price breakpoints for which we want to see an interpolation in-between,
  • y, which takes the corresponding Alpha estimates for the breakpoints, and
  • xout, which specifies the point in the price dimension for which we want to achieve an estimate for the utility.

We first establish two new variables ‘Util_A_price’ and ‘Util_B_price’ that, later on, store the interpolated utility values. The problem in our case is, that we have to apply the interpolation for each row of the dataframe d.
Therefore, we use a for() loop.
‘for(i in 1:nrow(d))’ instructs R to apply a certain command for each i in the interval [1|number of lines in d]. The subsequent command is then applied for each line i in the dataframe d. By using as.numeric() as a funtion on the result of our linear interpolation, we make sure, that the new variables Util_A_price and Util_B_price in d are filled with numeric values (approx() otherwise would give a list of values as result.)

d$Util_A_price <- 0

d$Util_B_price <- 0



for (i in 1:nrow(d)) {
  d$Util_A_price[i] <- as.numeric(approx(x = c(202.3, 775.489), y = c(d$price_202.3[i], d$price_775.489[i]), xout = 299.99)[2])

  d$Util_B_price[i] <- as.numeric(approx(x = c(202.3, 775.489), y = c(d$price_202.3[i], d$price_775.489[i]), xout = 349.99)[2])
}

Next, we build the total utility of the PS4 bundles A and B, by adding up the utilities without price and the utility of the price of A and B.

d$Total_Utility_A <- d$Util_A_without_Price + d$Util_A_price

d$Total_Utility_B <- d$Util_B_without_Price + d$Util_B_price

Finally, we evaluate, if A, on average, is significantly more liked than B, by counting the iterations for which the total utility of bundle A is higher as the utility of bundle B.

nrow(d[d$Total_Utility_A > d$Total_Utility_B, ]) / nrow(d) * 100
## [1] 100

Interpretation: Across all of the iterations the Alpha-based estimates for the mean total utility of the PS4 bundle A are higher then those obtained for bundle B. The Bayesian p-value corresponds to 0, a significant difference.


6.2 Visualizing the results

A description of the syntax is given below. We see, that the posteriori distributions are nearly non-overlapping highlighting the significant differences in mean total utilities for A and B.

Density_Plot <- ggplot(d) +
  geom_density(aes(x = d$Total_Utility_A), alpha = .2, fill = "#31B404") +
  geom_density(aes(x = d$Total_Utility_B), alpha = .2, fill = "#FFBF00") +
  labs(title = "A posteriori density of Total utility estimates for A and B") +
  labs(x = "Part worth utilities", y = "Density") +
  geom_vline(aes(xintercept = bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_low),
    color = "black", linetype = "dashed", size = 0.7
  ) +
  geom_vline(aes(xintercept = bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_high),
    color = "black", linetype = "dashed", size = 0.7
  ) +
  geom_vline(aes(xintercept = bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_low),
    color = "red", linetype = "longdash", size = 0.7
  ) +
  geom_vline(aes(xintercept = bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_high),
    color = "red", linetype = "longdash", size = 0.7
  ) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_low) + 0.6, y = max(density(d$Total_Utility_A)$y), label = round(bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_low, digits = 2)
  ) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_high) + 0.6, y = max(density(d$Total_Utility_A)$y), label = round(bayestestR::hdi(d$Total_Utility_A, 0.95)$CI_high, digits = 2)
  ) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_low) + 0.6, y = max(density(d$Total_Utility_B)$y), label = round(bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_low, digits = 2)
  ) +
  annotate(
    geom = "text",
    x = (bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_high) + 0.6, y = max(density(d$Total_Utility_B)$y), label = round(bayestestR::hdi(d$Total_Utility_B, 0.95)$CI_high, digits = 2)
  ) +
  annotate(
    geom = "text",
    x = median(d$Total_Utility_A), y = max(density(d$Total_Utility_A)$y) / 7, label = "A"
  ) +
  annotate(
    geom = "text",
    x = median(d$Total_Utility_B), y = max(density(d$Total_Utility_B)$y) / 7, label = "B"
  )


plot(Density_Plot)


Explanation of the syntax creating the above plot: The syntax is pretty much the same as for the figure in Example 1. However, we have to add some elements twice, as we now are plotting the values of two variables simultaneously. At the end of the chunk we added a new text label at the mean of each bundle’s distribution.


List of References

Kruschke, J. K. (2015). Doing bayesian data analysis: A tutorial with r, jags, and stan (2. ed.). Amsterdam: Academic Press.

Orme, B. K., & Chrzan, K. (2017). Becoming an expert in conjoint analysis: Choice modeling for pros. Orem, UT: Sawtooth Software.


  1. (Bayesian experts recently propose to use of 89% intervals, which are deemed to be more stable than, 95% intervals (Kruschke, 2015). Moreover, 89% indicates the arbitrariness of interval limits.)↩︎