Connect with me on Open Science Framework | Contact me via LinkedIn

R analysis script for presenting one solution on how to conduct a benefit segmentation on individual part worth utilities elicited by conjoint analysis (an ACBC in this case). Segmentation uses the simple example we discussed during class (Chapter 6, ACBC on PlayStation 4 bundles). Ideally, part worth utilities are estimated through monotone regression (Johnson, 1975) instead of Hierarchical Bayes to not shrink individual utilities toward population means (for this illustration we use HB utilities). Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ is located in your RStudio project folder. It contains the utilities (i.e., means of posterior draws after warmup) of n=139 consumers that did participate in the ACBC study.

As discussed during class, the consumer segmentation procedure spreads over 6 steps:

Exporting zero-centered part-worth utilities to SPSS/Excel
Importing utilities to R.
Choosing an appropriate number of consumer segments
Detecting and excluding multivariate outliers
Conduction cluster analysis/ consolidation of segment solution
Interpretation of consumer segments

Within the analysis script we cover the steps 2-6.

1 Load packages that will be used throughout the analysis

Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.

In R most functionality is provided by additional packages.
Most of the packages are well documented, See: https://cran.r-project.org/

The code chunk below first evaluates if the package pacman is already installed to your machine. If yes, the corresponding package will be loaded. If not, R will install the package.

Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).

The second line then loads the package pacman via the library() function.

The third line uses the function p_load() from the pacman package to install (if necessary) and load all packages that we provide as arguments (e.g., dplyr).

if(!"pacman" %in% rownames(installed.packages())) install.packages("pacman")

library(pacman)

pacman::p_load(knitr, tidyverse, readxl, ClustVarLV, tableone, kableExtra)

Here is the r session info:

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.1 tableone_0.12.0  ClustVarLV_2.0.1 readxl_1.3.1    
##  [5] forcats_0.5.0    stringr_1.4.0    dplyr_1.0.2      purrr_0.3.4     
##  [9] readr_1.4.0      tidyr_1.1.2      tibble_3.0.4     ggplot2_3.3.2   
## [13] tidyverse_1.3.0  knitr_1.30       pacman_0.5.1    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5        lattice_0.20-41   lubridate_1.7.9.2 assertthat_0.2.1 
##  [5] digest_0.6.27     foreach_1.5.1     R6_2.5.0          cellranger_1.1.0 
##  [9] plyr_1.8.6        backports_1.2.0   reprex_0.3.0      survey_4.0       
## [13] evaluate_0.14     httr_1.4.2        pillar_1.4.7      rlang_0.4.8      
## [17] rstudioapi_0.13   Matrix_1.2-18     rmarkdown_2.5     splines_4.0.3    
## [21] webshot_0.5.2     munsell_0.5.0     broom_0.7.2       compiler_4.0.3   
## [25] modelr_0.1.8      xfun_0.19         pkgconfig_2.0.3   mitools_2.4      
## [29] htmltools_0.5.0   tidyselect_1.1.0  codetools_0.2-18  viridisLite_0.3.0
## [33] fansi_0.4.1       crayon_1.3.4      dbplyr_2.0.0      withr_2.3.0      
## [37] grid_4.0.3        jsonlite_1.7.1    gtable_0.3.0      lifecycle_0.2.0  
## [41] DBI_1.1.0         magrittr_2.0.1    scales_1.1.1      cli_2.1.0        
## [45] stringi_1.5.3     fs_1.5.0          doParallel_1.0.16 xml2_1.3.2       
## [49] ellipsis_0.3.1    generics_0.1.0    vctrs_0.3.5       iterators_1.0.13 
## [53] tools_4.0.3       glue_1.4.2        hms_0.5.3         parallel_4.0.3   
## [57] survival_3.2-7    yaml_2.2.1        colorspace_2.0-0  rvest_0.3.6      
## [61] haven_2.3.1

2 Import the *.xlsx file from Excel

Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ (see E-Learning platform) is located at your RStudio project’s directory.

Save the file into a new dataframe called d

R is a functional language, meaning we could define a function that takes some arguments as input and delivers a result as an output.

We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name (i.e., read_excel() below).

In the code chunk below, we hand-in the file name as well as the argument sheet = 1 to the read_excel() function.
In this way we tell the function that it should import that data of the first spreadsheet in the file.

To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’.
If you encounter any problems with the read functions, please try to copy your *xlsx-file to the desktop and then use:

‘d<-as.data.frame(read_excel(file.choose(), sheet = 1))’.
This will open a dialogue that allows for a manual selection of the file.

d <- as.data.frame(read_excel("Utilities PS4 ACBC Zero Centered.xlsx", sheet = 1))

Now, the object d (a dataframe) became part of our working environment. We can access the dataframe.

For Example we can view the dataframe d via the function View(). We can also preview the first part of the data set by head()

The code below returns the first few rows ans columns of d.

View(head(d[,1:6]))

Brief Overview
Respondent ID	500 gigabyte	1 terabyte	black	white	OneDualShockController
1	-41.1005467	41.1005467	23.0185427	-23.0185427	11.580779
2	-40.9207006	40.9207006	35.7039431	-35.7039431	-29.301277
15	-2.2086182	2.2086182	30.2616954	-30.2616954	-14.598786
17	-0.0248455	0.0248455	0.4503716	-0.4503716	-9.486536
18	-6.0335218	6.0335218	18.4052887	-18.4052887	-18.882059
19	-1.5296434	1.5296434	1.2160663	-1.2160663	-61.038314

We see that each row corresponds to one consumer. The first column contains a unique case ID, whereas the rest of the columns presents the individual part worth utilities corresponding to the attribute levels involved in the conjoint study.

3 Choosing an appropriate number of consumer segments

In literature, countless algorithms exist on how to segment consumers (e.g., Kaufman & Rousseeuw, 2005; Reynolds, Richards, La Iglesia, & Rayward-Smith, 2006; Sarstedt & Mooi, 2019; Wedel & Kamakura, 2012). In this illustration we will use only one approach: Clustering of Variables Around Latent Variables (Vigneau & Qannari, 2003; Vigneau, Qannari, Punter, & Knoops, 2001). We do so mainly for 2 reasons.

Nice implementation in R, for example in the ClustVarLV package (Vigneau, Chen, & Qannari, 2015).
Implements Steps 3-6 in easy-to-use functions (Vigneau, Qannari, Navez, & Cottet, 2016).

We discuss the algorithms behind CLV during class.

3.1 Data transformations

The functions provided in the package ClustVarLV expect our data set in a layout where consumers are columns and part worth utilities are rows. Thus, we have to transpose our data set. This is dome by the t() function (see code chunk below ).

In the code chunk we transpose (flip rows and columns) all columns of d except for the first one, which contains the case IDs. We assign the results to a new dataframe named d_transposed.

d_transposed <- as.data.frame(t(d[, 2:ncol(d)]))

Next, we want to assign consumer IDs as column names in d_transposed. This is done below.

colnames(d_transposed) <- d$`Respondent ID`

We can review the first few columns and rows of the resultant dataframe.

View(head(d_transposed[, 1:6]))

Brief Overview
	1	2	15	17	18	19
500 gigabyte	-41.10055	-40.92070	-2.208618	-0.0248455	-6.033522	-1.529643
1 terabyte	41.10055	40.92070	2.208618	0.0248455	6.033522	1.529643
black	23.01854	35.70394	30.261695	0.4503716	18.405289	1.216066
white	-23.01854	-35.70394	-30.261695	-0.4503716	-18.405289	-1.216066
OneDualShockController	11.58078	-29.30128	-14.598786	-9.4865363	-18.882059	-61.038314
TwoDualShockController	-11.58078	29.30128	14.598786	9.4865363	18.882059	61.038314

3.2 An initial segmentation solution to determine number of segments

The next thing we do is choosing an appropriate number of segments. To do so, we use a hierarchical CLV approach which is then optimized by the kmeans-like algorithm outlined during class.

Hint: press F1 within functions to become familiar the expected arguments and returned results.

In the code chunk below, we use the CLV() function with 5 arguments.

x: the transposed dataframe containing only the individual part worth utilities.
method: ‘local’ means that we use the covariance/correlation between cases is used as a measure of proximity. This is exactly what we want.
maxiter: At maximum 200 iteration steps are allowed for the consolidation/post-hoc optimization of the hierarchical clustering solution by the kmeans-like algorithm.
nmax: Here 15 is the maximum number of consumer segments that will be considered.
sX: Whether to z-standardize the columns of d_transposed. Since, in our case, the variance of each column conveys an important information about an individual’s utility function, we turn-off standardization.

We save the results of the segmentation to an object with the name PS4_hierarchical_CLV.

PS4_hierarchical_CLV <- CLV(X = d_transposed, method = "local", maxiter = 200, nmax = 15, sX = FALSE)

Think back to our lecture. Within the CLV-approach, we try to maximize the criterion S (in an essence, we try to maximize the covariance of each consumer profile with the mean profile that a consumer is assigned to, the segment’s centroid). Since, we have used a hierarchical version of CLV, we can use a dendrogram of S to decide on an appropriate number of consumer segments.

plot(PS4_hierarchical_CLV, "dendrogram")

Interpretation: The interpretation is the same as for every other dendrogram. The x-axis shows each participant. The y-axis shows how much deterioration in S can be seen when successively merging cases until only one global consumer segment is left. We can see that the most pronounced drops in S occur at the steps where 4 segments are merged to 3 segments, and where 3 segments are merged into 2 segments.

The ClustVarLV package provides an even better visualization, the so-called ‘delta plot’, which visualizes the deterioration in S across different numbers of consumer segments after the additional consolidation stage that follows the hierarchical CLV-approach.

plot(PS4_hierarchical_CLV, "delta")

Interpretation: The delta plot enables for a clearer decision in favor of 3 consumer segments (because the drop in S is relatively pronounced when moving from 3 to 2 segments.

Its up to you to decide on the appropriate number of consumer segments. For example, you may consider

Interpretability in terms of distinct preference profiles and/or consumer characteristics.
Client preferences.
Profitability of special niche segments.
Targeted width of product assortments.

We can, additionally, validate how much the segment solution changes when moving from a 3 to a 2 segment solution.

In the below code chunk we combine two functions table() and get_partition(). The table() function, in our case, uses two variables with the case-wise segment labels as argument and, then, builds a crosstab of both vectors. The first vector is given by the hierarchical CLV solution drawing on 2 consumer segments (get_partition(PS4_hierarchical_CLV,K=2)). To extract the vector with the segment labels, we use the get_partition() function of the ClustVarLV package, which expects a CLV solution and the number of segments as arguments. The second vector is given by the hierarchical CLV solution drawing on 3 consumer segments (get_partition(PS4_hierarchical_CLV,K=3)).

table(get_partition(PS4_hierarchical_CLV, K = 2), get_partition(PS4_hierarchical_CLV, K = 3))

Switches from a 2 to a 3 segment solution
	1	2	3
1	82	0	21
2	0	31	5

Interpretation: For 3 segments, even the smallest segment contains enough consumers (26/139: 19%). In the present case, we stay with K=3.

4 Detecting an threatening outlier cases/ consolidation of segment solution

Next, we evaluate if we should exclude cases that are multivariate outliers, which are “observations that differ substantially from other observations in respect of one or more characteristics.” (Sarstedt & Mooi, 2019, p. 386). Usually, these cases are excluded from segmentation because they are assumed to disturb the clustering process as well as the subsequent interpretation of consumer segments.

The ClustVarLV package provides a function to automatically set aside affected cases.

CLV provides different strategies on outlier treatment (see the documentation of the ClustVarLV package). For this illustration, we use a ‘kplusone’ approach. This strategy simply sets aside atypical outlier consumers to an own ‘noise’ segment (we discussed the algorithm during class).

Usually, we expect a consumer j to be allocated to a group G_k when the correlation \(\rho\) between the consumer’s utilities x_j and the centroid c_k is high and positive.

The ‘kplusone’ approach simply imposes a lower threshold for \(\rho\). If for a consumer j, the correlation \(\rho\)=Cor(x_j, c_k) fails to exceed this threshold for any consumer segment centroid, then this consumer is set aside to the noise cluster. Choosing a threshold for \(\rho\) is arbitrary, but often 0.3 is used (Vigneau et al., 2016).

In the code chunk below, we apply the CLV_kmeans() function instead of CLV(). CLV_kmeans() dispenses the hierarchical clustering, and instead uses nstart random starts in the kmeans-like algorithm, discussed during lecture, to search for the run with the highest resulting target value of S. Thus, outlier detection, treatment and consolidation of segmentation solution is combined into one function. That is why I like the package for introductory purposes.

CLV_kmeans(), in this case, uses 7 arguments.

x: the transposed dataframe containing only the individual part worth utilities.
method: ‘local’ means that we use the covariance/correlation between cases is used as a measure of proximity. This is exactly what we want.
iter.max: At maximum 200 iteration steps are allowed in the kmeans-like algorithm.
clust: The number of consumer segments to be extracted.
sX: Whether to z-standardize the columns of d_transposed. Since, in our case, the variance of each column conveys an important information about an individual’s utility function, we turn-off standardization.
strategy: here we specify the way the algorithm treats outliers (“kplusone” in our case).
rho: The lower threshold for \(\rho\).

We save the results of the segmentation to an object with the name PS4_kmeans_PlusOne_CLV.

PS4_kmeans_PlusOne_CLV <- CLV_kmeans(X = d_transposed, method = "local", iter.max = 200, clust = 3, nstart = 200, sX = FALSE, strategy = "kplusone", rho = 0.3)

Next, we can briefly summarize the consolidated results.

summary(PS4_kmeans_PlusOne_CLV)

## $number
## noise cluster             1             2             3 
##             2            19            87            31 
## 
## $groups
## $groups[[1]]
##     cor in group  cor next group
## 176         0.80            0.42
## 43          0.79            0.50
## 54          0.79            0.56
## 57          0.79            0.56
## 186         0.76            0.68
## 46          0.73            0.48
## 174         0.72            0.57
## 65          0.69            0.37
## 171         0.69            0.52
## 106         0.64            0.49
## 194         0.64            0.56
## 27          0.62            0.50
## 87          0.62            0.33
## 47          0.59            0.53
## 64          0.56            0.02
## 130         0.47            0.12
## 210         0.46            0.22
## 56          0.40            0.30
## 34          0.36            0.14
## 
## $groups[[2]]
##     cor in group  cor next group
## 148         0.98            0.58
## 200         0.97            0.50
## 154         0.96            0.65
## 160         0.95            0.56
## 198         0.95            0.66
## 42          0.94            0.57
## 165         0.94            0.58
## 189         0.94            0.59
## 139         0.93            0.51
## 180         0.93            0.52
## 184         0.93            0.58
## 195         0.93            0.65
## 209         0.93            0.49
## 214         0.93            0.54
## 15          0.92            0.55
## 71          0.92            0.64
## 89          0.92            0.51
## 167         0.92            0.45
## 202         0.92            0.56
## 204         0.92            0.50
## 40          0.91            0.58
## 92          0.91            0.68
## 95          0.91            0.46
## 144         0.91            0.58
## 150         0.91            0.50
## 151         0.91            0.62
## 32          0.90            0.65
## 70          0.90            0.60
## 91          0.90            0.61
## 120         0.90            0.59
## 136         0.90            0.61
## 159         0.90            0.47
## 197         0.90            0.66
## 104         0.89            0.54
## 109         0.89            0.56
## 142         0.89            0.63
## 147         0.89            0.43
## 192         0.88            0.60
## 208         0.88            0.55
## 30          0.87            0.73
## 73          0.87            0.76
## 172         0.87            0.55
## 188         0.87            0.62
## 190         0.87            0.78
## 196         0.87            0.52
## 213         0.87            0.56
## 33          0.86            0.66
## 63          0.86            0.47
## 177         0.86            0.57
## 203         0.86            0.51
## 53          0.85            0.56
## 153         0.85            0.59
## 182         0.85            0.75
## 1           0.84            0.60
## 90          0.84            0.55
## 211         0.84            0.63
## 28          0.83            0.65
## 116         0.83            0.60
## 162         0.83            0.70
## 181         0.83            0.65
## 119         0.81            0.62
## 135         0.81            0.55
## 164         0.80            0.47
## 115         0.79            0.30
## 169         0.79            0.53
## 107         0.78            0.74
## 2           0.77            0.70
## 38          0.77            0.42
## 149         0.76            0.53
## 207         0.76            0.44
## 118         0.74            0.66
## 168         0.74            0.60
## 183         0.74            0.30
## 166         0.72            0.57
## 173         0.72            0.69
## 79          0.71            0.60
## 45          0.69            0.67
## 77          0.69            0.57
## 17          0.68            0.64
## 81          0.68            0.19
## 58          0.67            0.60
## 140         0.67            0.62
## 187         0.65            0.48
## 141         0.62            0.49
## 85          0.60            0.47
## 133         0.60            0.44
## 138         0.60            0.46
## 
## $groups[[3]]
##     cor in group  cor next group
## 35          0.85            0.50
## 163         0.84            0.68
## 156         0.81            0.80
## 74          0.79            0.54
## 44          0.77            0.58
## 86          0.77            0.51
## 29          0.76            0.43
## 55          0.76            0.52
## 191         0.75            0.60
## 78          0.70            0.58
## 96          0.70            0.63
## 129         0.70            0.42
## 178         0.70            0.63
## 212         0.69            0.61
## 19          0.68            0.26
## 75          0.68            0.59
## 103         0.68            0.63
## 131         0.68            0.51
## 132         0.68            0.09
## 170         0.68            0.63
## 62          0.64            0.12
## 69          0.64            0.53
## 18          0.63            0.49
## 193         0.62            0.36
## 41          0.61            0.43
## 31          0.59            0.12
## 114         0.54            0.27
## 175         0.51            0.42
## 88          0.50            0.27
## 36          0.44            0.14
## 99          0.39            0.13
## 
## 
## $set_aside
##  39 201 
##  18 128 
## 
## $cormatrix
##       Comp1 Comp2 Comp3
## Comp1  1.00  0.60  0.45
## Comp2  0.60  1.00  0.56
## Comp3  0.45  0.56  1.00

Interpretation: The summary() function provides us with a lot of useful information. First we see segment sizes after outlier treatment and consolidation. In our example, 2 consumers are set aside, because of their atypical preference patterns (results may slightly differ in your analysis run because of the non-deterministic properties of the algorithm). At the bottom of the output we see that these consumers are those with the ID’s 18 and 39. Furthermore, for each of the 3 consumer segments we see the corresponding ID’s, the correlation to the assigned segment centroid, as well as the highest correlation to each of the other consumer segments. Our goal should be to find a segment solution where for each consumer the correlation with the own segment centroid is much higher than the correlation with other segments’ centroids. The final part of the output shows how strong the correlation is between the centroids (‘Comp1’ to ‘Comp3’) of different consumer segments. The lower the correlation between two segments, the more dissimilar are the part worth utilities of the consumers segments. Thus, in our example, consumer segment 1 is slightly more similar to segment 2 as compared to segment 3.

The ClustVarLV package additional provides an intuitive graphical depiction of segment similarity based on principle component analysis (PCA, see other lectures such as Marketing Methods and Analysis, or Sarstedt & Mooi (2019), p. 259).

The code below generates segment-wise loading plots of each consumer using the first 2 principal components of the PCA. We use the function plot_var(), which expects a segment solution produced by the functions in the ClustVarLV package as arguments.

plot_var(PS4_kmeans_PlusOne_CLV, beside = TRUE)

Interpretation: We start with the plot for consumers segment 1 (blue). The plot presents the loadings of each consumer in segment 1 on the first 2 principal components. We can extract several information. Vectors which are very close to each other in term of their angle represent consumers that have a very similar part worth utility profile across attribute levels. Vectors that point into opposite directions would represent consumers whose preferences are very different, yet attribute levels preferred by one consumer would have been rejected by the other. Vectors that are orthogonal to each other represent consumers that have different preferences regarding different attribute levels. The x-axis presents the loadings regarding the first principal component (PC). The y-axis visualizes the loadings of the second PC. We see that across all consumers the first PC explains approx. 53% of the variation in part worth utilities, while the second PC explains additional 8%. Thus, the visual depiction is a simplification as it does not explain 39% of the variation in preferences. We further can see that some consumers are better explained by the loading plot (long vectors) as compared to others (short vectors). Furthermore, when comparing the plots of different consumer segments, it becomes clear that segment 1 is, by tendency, more close to segment 2 as compared to segment 3, because vectors of 1 and 2, by tendency, point into the same direction. The last plot in the output is devoted to the consumers in the noise cluster. As we can see, the preference vectors of these consumers are dissimilar to those of other segments and, overall, not well explained by the PCA.

5 Interpretation of consumer segments

Next, we make more sense out of the 3 revealed consumer segments. Usually, this is done by:

Profiling consumer characteristics by segment (e.g., age, gender, income group) <- shows marketing which consumers to address.
Profiling differences in preferences (part worth utilities) by segment <- shows marketing how to address consumers in distinct segments.

The very first thing we want to know is, which consumer belongs to which of the 3 segments.

The code chunk below applies the get_partition() function that the ClustVarLV package provides to the obtained segment solution (‘PS4_kmeans_PlusOne_CLV’). The call saves the segment assignments of each consumers as a new variable in the initial dataframe d.

d$Segment <- get_partition(PS4_kmeans_PlusOne_CLV)

We can preview the results. In the code below we use the select() function of the dplyr package to select only the variables “Respondent ID” and “Segment” of a dataframe consisting of the first 6 rows of d.

select(d[1:6, ], "Respondent ID", "Segment")

Brief Overview
Respondent ID	Segment
1	2
2	2
15	2
17	2
18	3
19	3

Often, we want to use these segment assignments for further analysis in other programs (e.g., in Sawtooth Software Lighthouse Studio market simulations).

An easy way to use the segmentation results in other programs is to just copy and paste them. The code chunk below extract a copy of the dataframe d to the clipboard. After its execution, we can, for example, paste d to a new sheet in MS Excel. We produce a comma separated file by using ‘,’ as the separation argument within the function write.table(). Based on this it is easy to profile consumer characteristics by segment.

write.table(d, "clipboard-16384", row.names = FALSE, col.names = TRUE, dec = ".", sep = ",")

In a final step, we want to profile differences in preferences (part worth utilities) by segment. As often in R, many roads lead to Rome. In this illustration we use functions provided by the dplyr package, which easily summarizes continuous variables.

In the code chunk below, we first use the count() function to see the segment sizes again. Here we use the pipe operator (%>%) to build an analysis pipeline. The pipe operator is particularly useful if you have to build very long chained commands in R that would otherwise have been hard to read if we only use multiple () (see here for more information on the pipe operator). In our small example we simply forward d as input to the function count(), which then counts the values of the variable Segment.

d %>% count(Segment)

Frequency of segment mmmbership
Segment	n
0	2
1	19
2	87
3	31

Next, we build a longer analysis pipeline to come up with segment-wise means. We assign the results of this pipe to a new object named “Segment_means”. The pipe starts with handing in d to the select() function. This function helps us selecting only the columns containing the variables named “500 gigabyte” to “Segment”. We do so to exclude the first variable of d (“Respondent ID”). The remaining part of d then goes to the group_by() function. We use this function to group the pipeline’s results by the variable “Segment”. This all then continues the pipeline with the sequence summarise_all(.funs = c(“mean”)). By doing so, we communicate R that we want to see the means of all handed-in variables. Afterwards, we continue the pipeline with t() function which simply transposes the output. Finally, we end the pipeline with the as.data.frame() function to convert our resulting table to a dataframe.

The next command sores the rownames of Segment_means to a new object named row_names.

Then, we just convert the results to numeric values, which however removes the rownames.

Finally, we re-import the rownames.

Segment_means <- d %>%
  select("500 gigabyte": "Segment") %>%
  group_by(Segment) %>%
  summarise_all(.funs = c(mean)) %>%
  t(.) %>%
  as.data.frame(.)

row_names <- row.names(Segment_means)

Segment_means <- apply(Segment_means, 2, as.numeric)

row.names(Segment_means) <- row_names

Segment_means

	V1	V2	V3	V4
Segment	0.0000000	1.000000	2.0000000	3.000000
500 gigabyte	-0.3151681	-31.057344	-15.2463061	-17.002430
1 terabyte	0.3151681	31.057344	15.2463061	17.002430
black	30.5461976	5.326550	13.4115365	13.617044
white	-30.5461976	-5.326550	-13.4115365	-13.617044
OneDualShockController	9.1163593	-12.059881	-24.8880320	-49.813288
TwoDualShockController	-9.1163593	12.059881	24.8880320	49.813288
1 charging station for 2 PS4 controller	68.4219362	-1.923417	14.9145560	27.488213
1 PS4 wireless Stereo Headset 2.0	-41.6626303	11.510593	-14.4108099	3.272228
no accessories	-26.7593059	-9.587176	-0.5037462	-30.760441
Far Cry Primal	63.5805969	7.787115	-4.6289195	4.340621
GTA V	-8.8534334	-42.011125	14.8737598	55.621836
Life is strange	35.2212113	13.087580	-6.1458464	-39.342757
Tom Clancy’s The Division	-52.3138536	24.337567	-3.5534350	-2.175507
no action-adventure game	-37.6345212	-3.201137	-0.5455589	-18.444193
The_Witcher_3	-22.2702995	12.677612	-2.4169328	8.908047
Fallout_4	-1.6420465	1.598579	3.8661944	10.876971
Final_Fantasy	62.2997469	-30.910102	-3.1053181	-29.109783
Dark_Souls_3	-101.1308379	-2.877186	-4.7865641	7.028075
no_role_playing	62.7434369	19.511097	6.4426206	2.296690
Just_Dance_2016	54.6915688	-17.447325	-13.3935184	-53.457665
Guitar_Hero_Live	63.8429751	36.031875	-9.9128278	-22.944160
FIFA_16	22.0714598	-55.307497	22.7730747	68.834984
no_family_companionship	-140.6060037	36.722948	0.5332715	7.566840
PRICE_207.95	5.5597022	80.408733	160.2424365	58.049792
PRICE_370.95	1.8334215	26.516367	52.8431061	19.143065
PRICE_436.95	0.3246208	4.694918	9.3562607	3.389420
PRICE_508.95	-1.3213437	-19.110299	-38.0839343	-13.796373
PRICE_730.95	-6.3964008	-92.509719	-184.3578690	-66.785904
NONE	90.2605177	83.765037	-24.0282087	89.891343

The results show a first table presenting the mean part worth utilities grouped by the segment. We can further style the results by adding column names and removing the first row of the table.

In the code chunk below, we firstly set column names that correspond to the first row in Segment_means Then, we remove the first row by using the -c(1) notation.

colnames(Segment_means) <- Segment_means[1,]

Segment_means <- Segment_means[-c(1),] 

Segment_means

	0	1	2	3
500 gigabyte	-0.3151681	-31.057344	-15.2463061	-17.002430
1 terabyte	0.3151681	31.057344	15.2463061	17.002430
black	30.5461976	5.326550	13.4115365	13.617044
white	-30.5461976	-5.326550	-13.4115365	-13.617044
OneDualShockController	9.1163593	-12.059881	-24.8880320	-49.813288
TwoDualShockController	-9.1163593	12.059881	24.8880320	49.813288
1 charging station for 2 PS4 controller	68.4219362	-1.923417	14.9145560	27.488213
1 PS4 wireless Stereo Headset 2.0	-41.6626303	11.510593	-14.4108099	3.272228
no accessories	-26.7593059	-9.587176	-0.5037462	-30.760441
Far Cry Primal	63.5805969	7.787115	-4.6289195	4.340621
GTA V	-8.8534334	-42.011125	14.8737598	55.621836
Life is strange	35.2212113	13.087580	-6.1458464	-39.342757
Tom Clancy’s The Division	-52.3138536	24.337567	-3.5534350	-2.175507
no action-adventure game	-37.6345212	-3.201137	-0.5455589	-18.444193
The_Witcher_3	-22.2702995	12.677612	-2.4169328	8.908047
Fallout_4	-1.6420465	1.598579	3.8661944	10.876971
Final_Fantasy	62.2997469	-30.910102	-3.1053181	-29.109783
Dark_Souls_3	-101.1308379	-2.877186	-4.7865641	7.028075
no_role_playing	62.7434369	19.511097	6.4426206	2.296690
Just_Dance_2016	54.6915688	-17.447325	-13.3935184	-53.457665
Guitar_Hero_Live	63.8429751	36.031875	-9.9128278	-22.944160
FIFA_16	22.0714598	-55.307497	22.7730747	68.834984
no_family_companionship	-140.6060037	36.722948	0.5332715	7.566840
PRICE_207.95	5.5597022	80.408733	160.2424365	58.049792
PRICE_370.95	1.8334215	26.516367	52.8431061	19.143065
PRICE_436.95	0.3246208	4.694918	9.3562607	3.389420
PRICE_508.95	-1.3213437	-19.110299	-38.0839343	-13.796373
PRICE_730.95	-6.3964008	-92.509719	-184.3578690	-66.785904
NONE	90.2605177	83.765037	-24.0282087	89.891343

Now, we can interpret the differences in mean part worth utilities across segments. For example, segment 1 strongly rejects the action adventure GTA V in a PS4 bundle, whereas this game is strongly preferred by segment 2 and segment 3 (your results might slightly vary because of the non-deterministic character of the segmentation algorithm).

We can finally copy the table with the segment means to the clipboard (as done above).

write.table(Segment_means, 'clipboard-16384', row.names = TRUE, col.names = FALSE, dec = ".", sep = ",")

List of References

Johnson, R. M. (1975). A simple method for pairwise monotone regression. Psychometrika, 40(2), 163–168. doi: \url{10.1007/BF02291563}

Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data: An introduction to cluster analysis. Hoboken NJ: Wiley-Interscience.

Reynolds, A. P., Richards, G., La Iglesia, B. de, & Rayward-Smith, V. J. (2006). Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms, 5(4), 475–504. doi: \url{10.1007/s10852-005-9022-1}

Sarstedt, M., & Mooi, E. (2019). A concise guide to market research: The process, data, and methods using ibm spss statistics (3. ed.). Berlin - Heidelberg: Springer.

Vigneau, E., Chen, M., & Qannari, E. M. (2015). ClustVarLV: An r package for the clustering of variables around latent variables. The R Journal, 7(2), 134–148.

Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics - Simulation and Computation, 32(4), 1131–1150. doi: \url{10.1081/SAC-120023882}

Vigneau, E., Qannari, E. M., Navez, B., & Cottet, V. (2016). Segmentation of consumers in preference studies while setting aside atypical or irrelevant consumers. Food Quality and Preference, 47(January), 54–63. doi: \url{10.1016/j.foodqual.2015.02.008}

Vigneau, E., Qannari, E. M., Punter, P. H., & Knoops, S. (2001). Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference, 12(5-7), 359–363. doi: \url{10.1016/S0950-3293(01)00025-8}

Wedel, M., & Kamakura, W. A. (2012). Market segmentation: Conceptual and methodological foundations (2nd ed.). Springer Science & Business Media.

OCR Sawtooth ACBC

Chapter 6: Benefit_Segmentation_of_Part_Worth_Utilities

Asst. Prof. Dr. Marcel Lichters

2020-12-15