Connect with me on Open Science Framework | Contact me via LinkedIn
R analysis script for presenting one solution on how to conduct a benefit segmentation on individual part worth utilities elicited by conjoint analysis (an ACBC in this case). Segmentation uses the simple example we discussed during class (Chapter 6, ACBC on PlayStation 4 bundles). Ideally, part worth utilities are estimated through monotone regression (Johnson, 1975) instead of Hierarchical Bayes to not shrink individual utilities toward population means (for this illustration we use HB utilities). Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ is located in your RStudio project folder. It contains the utilities of n=139 consumers that did participate in the ACBC study.
As discussed during class, the consumer segmentation procedure spreads over 6 steps:
Within the analysis script we cover the steps 2-6.
Here is the r session info:
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.2 magrittr_1.5 tools_3.6.2 htmltools_0.4.0 yaml_2.2.0 Rcpp_1.0.1 stringi_1.4.3 rmarkdown_2.0 knitr_1.26 stringr_1.4.0 xfun_0.11 digest_0.6.23 rlang_0.4.0 evaluate_0.14
Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.
In R most functionality is provided by additional packages.
Most of the packages are well documented, See: https://cran.r-project.org/
The code below evaluates if the packages we will use in this chapter (e.g., readxl) are still installed to your machine. If yes, the corresponding package will be loaded via the library() function. If not, R will install the package (install.packages() and than load it.
if ("readxl" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(readxl))
} else {
install.packages("readxl", repos = "https://cloud.r-project.org")
}
if ("ClustVarLV" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(ClustVarLV))
} else {
install.packages("ClustVarLV", repos = "https://cloud.r-project.org")
}
if ("tableone" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(tableone))
} else {
install.packages("tableone", repos = "https://cloud.r-project.org")
}
if ("dplyr" %in% rownames(installed.packages())) {
suppressPackageStartupMessages(library(dplyr))
} else {
install.packages("dplyr", repos = "https://cloud.r-project.org")
}
Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ (see E-Learning platform) is located at your RStudio project’s directory.
Save the file into a new dataframe called d
R is a functional language, meaning we could define a function that takes some arguments as input and delivers a result as an output.
We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name (i.e., read_excel() below).
In the code chunk below, we hand-in the file name as well as the argument sheet = 1 to the read_excel() function.
In this way we tell the function that it should import that data of the first spreadsheet in the file.
To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’.
If you encounter any problems with the read functions, please try to copy your *xlsx-file to the desktop and then use:
‘d<-as.data.frame(read_excel(file.choose(), sheet = 1))’.
This will open a dialogue that allows for a manual selection of the file.
d<-as.data.frame(read_excel("Utilities PS4 ACBC Zero Centered.xlsx", sheet = 1))
Now, the object d (a dataframe) became part of our working environment. We can access the dataframe.
For Example we can view the dataframe d via the function View(). We can also preview the first part of the data set by head()
The code below returns the first few rows ans columns of d.
View(head(d[,1:6]))
Respondent ID | 500 gigabyte | 1 terabyte | black | white | OneDualShockController |
---|---|---|---|---|---|
1 | -41.1005467 | 41.1005467 | 23.0185427 | -23.0185427 | 11.580779 |
2 | -40.9207006 | 40.9207006 | 35.7039431 | -35.7039431 | -29.301277 |
15 | -2.2086182 | 2.2086182 | 30.2616954 | -30.2616954 | -14.598786 |
17 | -0.0248455 | 0.0248455 | 0.4503716 | -0.4503716 | -9.486536 |
18 | -6.0335218 | 6.0335218 | 18.4052887 | -18.4052887 | -18.882059 |
19 | -1.5296434 | 1.5296434 | 1.2160663 | -1.2160663 | -61.038314 |
We see that each row corresponds to one consumer. The first column contains a unique case ID, whereas the rest of the columns presents the individual part worth utilities corresponding to the attribute levels involved in the conjoint study.
In literature, countless algorithms exist on how to segment consumers (e.g., Kaufman & Rousseeuw, 2005; Reynolds, Richards, La Iglesia, & Rayward-Smith, 2006; Sarstedt & Mooi, 2019; Wedel & Kamakura, 2012). In this illustration we will use only one approach: Clustering of Variables Around Latent Variables (Vigneau & Qannari, 2003; Vigneau, Qannari, Punter, & Knoops, 2001). We do so mainly for 2 reasons.
We discuss the algorithms behind CLV during class.
The functions provided in the package ClustVarLV expect our data set in a layout where consumers are columns and part-worth utilities are rows. Thus, we have to transpose our data set. This is dome by the t() function (see code chunk below ).
In the code chunk we transpose (flip rows and columns) all columns of d except for the first one, which contains the case IDs. We assign the results to a new dataframe named d_transposed.
d_transposed<-as.data.frame(t(d[,2:ncol(d)]))
Next, we want to assign consumer IDs as column names in d_transposed. This is done below.
colnames(d_transposed)<-d$`Respondent ID`
We can review the first few columns and rows of the resultant dataframe.
View(head(d_transposed[,1:6]))
1 | 2 | 15 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|
500 gigabyte | -41.10055 | -40.92070 | -2.208618 | -0.0248455 | -6.033522 | -1.529643 |
1 terabyte | 41.10055 | 40.92070 | 2.208618 | 0.0248455 | 6.033522 | 1.529643 |
black | 23.01854 | 35.70394 | 30.261695 | 0.4503716 | 18.405289 | 1.216066 |
white | -23.01854 | -35.70394 | -30.261695 | -0.4503716 | -18.405289 | -1.216066 |
OneDualShockController | 11.58078 | -29.30128 | -14.598786 | -9.4865363 | -18.882059 | -61.038314 |
TwoDualShockController | -11.58078 | 29.30128 | 14.598786 | 9.4865363 | 18.882059 | 61.038314 |
The next thing we do is choosing an appropriate number of segments. To do so, we use a hierarchical CLV approach which is then optimized by the kmeans-like algorithm outlined during class.
Hint: press F1 within functions to become familiar the expected arguments and returned results.
In the code chunk below, we use the CLV() function with 5 arguments.
We save the results of the segmentation to an object with the name PS4_hierarchical_CLV.
PS4_hierarchical_CLV<- CLV(X = d_transposed, method = "local", maxiter = 200, nmax = 15, sX=FALSE)
Think back to our lecture. Within the CLV-approach, we try to maximize the criterion S (in an essence, we try to maximize the covariance of each consumer profile with the mean profile that a consumer is assigned to, the segment’s centroid). Since, we have used a hierarchical version of CLV, we can use a dendrogram of S to decide on an appropriate number of consumer segments.
plot(PS4_hierarchical_CLV, "dendrogram")
Interpretation: The interpretation is the same as for every other dendrogram. The x-axis shows each participant. The y-axis shows how much deterioration in S can be seen when successively merging cases until only one global consumer segment is left. We can see that the most pronounced drops in S occur at the steps where 4 segments are merged to 3 segments, and where 3 segments are merged into 2 segments.
The ClustVarLV package provides an even better visualization, the so-called ‘delta plot’, which visualizes the deterioration in S across difference numbers of consumer segments after the additional consolidation stage that follows the hierarchical CLV-approach.
plot(PS4_hierarchical_CLV, "delta")
Interpretation: The delta plot enables for a clearer decision in favor of 3 consumer segments (because the drop in S is relatively pronounced when moving from 3 to 2 segments.
Its up to you to decide on the appropriate number of consumer segments. For example, you may consider
We can, additionally, validate how much the segment solution changes when moving from a 3 to a 2 segment solution.
In the below code chunk we combine two functions table() and get_partition(). The table() function, in our case, uses two variables with the case-wise segment labels as argument and, then, builds a crosstab of both vectors. The first vector is given by the hierarchical CLV solution drawing on 2 consumer segments (get_partition(PS4_hierarchical_CLV,K=2)). To extract the vector with the segment labels, we use the get_partition() function of the ClustVarLV package, which expects a CLV solution and the number of segments as arguments. The second vector is given by the hierarchical CLV solution drawing on 3 consumer segments (get_partition(PS4_hierarchical_CLV,K=3)).
table(get_partition(PS4_hierarchical_CLV,K=2), get_partition(PS4_hierarchical_CLV,K=3))
1 | 2 | 3 | |
---|---|---|---|
1 | 82 | 0 | 21 |
2 | 0 | 31 | 5 |
Interpretation: For 3 segments, even the smallest segment contains enough consumers (26/139: 19%). In the present case, we stay with K=3.
Next, we evaluate if we should exclude cases that are multivariate outliers, which are “observations that differ substantially from other observations in respect of one or more characteristics.” (Sarstedt & Mooi, 2019, p. 386). Usually, these cases are excluded from segmentation because they are assumed to disturb the clustering process as well as the subsequent interpretation of consumer segments.
The ClustVarLV package provide a function to automatically set aside affected cases.
CLV provides different strategies on outlier treatment (see the documentation of the ClustVarLV package). For this illustration, we use a ‘kplusone’ approach. This strategy simply sets aside atypical outlier consumers to an own ‘noise’ segment (we discussed the algorithm during class).
Usually, we expect a consumer j to be allocated to a group Gk when the correlation \(\rho\) between the consumer’s utilities xj and the centroid ck is high and positive.
The ‘kplusone’ approach simply imposes a lower threshold for \(\rho\). If for a consumer j, the correlation \(\rho\)=Cor(xj, ck) fails to exceed this threshold for any consumer segment centroid, then this consumer is set aside to the noise cluster. Choosing a threshold for \(\rho\) is arbitrary, but often 0.3 is used (Vigneau et al., 2016).
In the code chunk below, we apply the CLV_kmeans() function instead of CLV(). CLV_kmeans() dispenses the hierarchical clustering, and instead uses nstart random starts in the kmeans-like algorithm, discussed during lecture, to search for the run with the highest resulting target value of S. Thus, outlier detection, treatment and consolidation of segmentation solution is combined into one function. That is why I like the package for introductory purposes.
CLV_kmeans(), in this case, uses 7 arguments.
We save the results of the segmentation to an object with the name PS4_kmeans_PlusOne_CLV.
PS4_kmeans_PlusOne_CLV<-CLV_kmeans(X = d_transposed, method = "local", iter.max = 200, clust = 3, nstart = 200, sX=FALSE, strategy="kplusone" ,rho = 0.3)
Next, we can briefly summarize the consolidated results.
summary(PS4_kmeans_PlusOne_CLV)
## $number
## noise cluster 1 2 3
## 2 30 19 88
##
## $groups
## $groups[[1]]
## cor in group cor next group
## 35 0.85 0.51
## 163 0.83 0.68
## 74 0.80 0.54
## 86 0.78 0.51
## 44 0.77 0.59
## 55 0.76 0.52
## 29 0.75 0.44
## 191 0.74 0.60
## 78 0.70 0.58
## 96 0.70 0.63
## 129 0.70 0.42
## 19 0.69 0.26
## 75 0.69 0.60
## 131 0.69 0.51
## 178 0.69 0.63
## 212 0.69 0.61
## 103 0.68 0.63
## 132 0.68 0.10
## 170 0.67 0.63
## 62 0.64 0.13
## 69 0.64 0.53
## 18 0.63 0.50
## 193 0.62 0.36
## 41 0.61 0.43
## 31 0.59 0.13
## 114 0.55 0.27
## 88 0.51 0.27
## 175 0.50 0.43
## 36 0.45 0.14
## 99 0.40 0.13
##
## $groups[[2]]
## cor in group cor next group
## 176 0.80 0.42
## 43 0.79 0.50
## 54 0.79 0.56
## 57 0.79 0.56
## 186 0.76 0.68
## 46 0.73 0.48
## 174 0.72 0.57
## 65 0.69 0.37
## 171 0.69 0.52
## 106 0.64 0.49
## 194 0.64 0.56
## 27 0.62 0.50
## 87 0.62 0.33
## 47 0.59 0.53
## 64 0.56 0.03
## 130 0.47 0.13
## 210 0.46 0.22
## 56 0.40 0.30
## 34 0.36 0.14
##
## $groups[[3]]
## cor in group cor next group
## 148 0.98 0.58
## 200 0.97 0.50
## 154 0.96 0.63
## 160 0.95 0.56
## 198 0.95 0.66
## 42 0.94 0.57
## 165 0.94 0.58
## 139 0.93 0.50
## 180 0.93 0.50
## 184 0.93 0.56
## 189 0.93 0.59
## 195 0.93 0.65
## 209 0.93 0.47
## 214 0.93 0.54
## 15 0.92 0.55
## 71 0.92 0.64
## 89 0.92 0.49
## 167 0.92 0.45
## 202 0.92 0.56
## 40 0.91 0.56
## 92 0.91 0.66
## 95 0.91 0.45
## 144 0.91 0.58
## 150 0.91 0.50
## 151 0.91 0.62
## 204 0.91 0.50
## 32 0.90 0.65
## 70 0.90 0.58
## 91 0.90 0.61
## 120 0.90 0.57
## 136 0.90 0.59
## 159 0.90 0.47
## 197 0.90 0.64
## 104 0.89 0.54
## 109 0.89 0.56
## 142 0.89 0.63
## 147 0.89 0.43
## 192 0.88 0.59
## 208 0.88 0.55
## 30 0.87 0.73
## 73 0.87 0.75
## 172 0.87 0.55
## 188 0.87 0.61
## 190 0.87 0.76
## 196 0.87 0.51
## 213 0.87 0.56
## 33 0.86 0.64
## 63 0.86 0.47
## 177 0.86 0.55
## 182 0.86 0.74
## 203 0.86 0.49
## 53 0.85 0.56
## 1 0.84 0.60
## 90 0.84 0.55
## 153 0.84 0.59
## 211 0.84 0.63
## 28 0.83 0.65
## 116 0.83 0.59
## 162 0.83 0.69
## 181 0.83 0.65
## 119 0.81 0.60
## 135 0.81 0.55
## 156 0.81 0.80
## 164 0.80 0.47
## 115 0.79 0.30
## 169 0.79 0.51
## 107 0.78 0.74
## 2 0.77 0.70
## 38 0.77 0.40
## 207 0.77 0.44
## 149 0.76 0.53
## 118 0.74 0.66
## 168 0.74 0.60
## 183 0.74 0.28
## 166 0.72 0.57
## 173 0.72 0.68
## 79 0.70 0.60
## 17 0.69 0.62
## 45 0.69 0.66
## 77 0.69 0.57
## 81 0.68 0.17
## 58 0.67 0.60
## 140 0.67 0.62
## 187 0.65 0.47
## 141 0.62 0.49
## 85 0.60 0.47
## 133 0.60 0.43
## 138 0.59 0.46
##
##
## $set_aside
## 39 201
## 18 128
##
## $cormatrix
## Comp1 Comp2 Comp3
## Comp1 1.00 0.44 0.55
## Comp2 0.44 1.00 0.60
## Comp3 0.55 0.60 1.00
Interpretation: The summary() function provides us with a lot of useful information. First we see segment sizes after outlier treatment and consolidation. In our example, 2 consumers are set aside, because of their atypical preference patterns (results may slightly differ in your analysis run because of the non-deterministic properties of the algorithm). At the bottom of the output we see that these consumers are those with the ID’s 18 and 39. Furthermore, for each of the 3 consumer segments we see the corresponding ID’s, the correlation to the assigned segment centroid, as well as the highest correlation to each of the other consumer segments. Our goal should be to find a segment solution where for each consumer the correlation with the own segment centroid is much higher than the correlation with other segments’ centroids. The final part of the output shows how strong the correlation is between the centroids (‘Comp1’ to ‘Comp3’) of different consumer segments. The lower the correlation between two segments, the more dissimilar are the part worth utilities of the consumers segments. Thus, in our example, consumer segment 1 is slightly more similar to segment 2 as compared to segment 3.
The ClustVarLV package, additional, provides an intuitive graphical depiction of segment similarity based on principle component analysis (PCA, see other lectures such as Marketing Methods and Analysis, or Sarstedt & Mooi (2019), p. 259).
The code below generates segment-wise loading plots of each consumer using the first 2 principal components of the PCA. We use the function plot_var(), which expects a segment solution produced by the functions in the ClustVarLV package as arguments.
plot_var(PS4_kmeans_PlusOne_CLV, beside = TRUE)
Interpretation: We start with the plot for consumers segment 1 (blue). The plot presents the loadings of each consumers in segment 1 on the first 2 principal components. We can extract several information. Vectors in this plot that are very close to each other in term of their angle represents consumers that have very similar part worth utility profile across attribute levels. Vectors that point into opposite directions would represent consumers whose preferences are very different, yet attribute levels preferred by one consumer would have been rejected by the other. Vectors that are orthogonal to each other represent consumers that have different preferences regarding different attribute levels. The x-axis presents the loadings regarding the first principal component (PC). The y-axis visualizes the loadings of the second PC. We see that across all consumers the first PC explains approx. 53% of the variation in part worth utilities, while the second PC explains additional 8%. Thus, the visual depiction is a simplification as it does not explain 39% of the variation in preferences. We further can see that some consumers are better explained by the loading plot (long vectors) as compared to others (short vectors). Furthermore, when comparing the plots of different consumer segments, it becomes clear that segment 1 is, by tendency, more close to segment 2 as compared to segment 3, because vectors of 1 and 2, by tendency, point into the same direction. The last plot in the output is devoted to the consumers in the noise cluster. As we can see, the preference vectors of these consumers are dissimilar to those of other segments and, overall, not well explained by the PCA.
Next, we make more sense out of the 3 revealed consumer segments. Usually, this is done by:
The very first thing we want to know is, which consumer belongs to which of the 3 segments.
The code chunk below applies the get_partition() function the ClustVarLV package provides to the obtained segment solution (‘PS4_kmeans_PlusOne_CLV’). The call saves the segment assignments of each consumers as a new variable in the initial dataframe d.
d$Segment <- get_partition(PS4_kmeans_PlusOne_CLV)
We can preview the results. In the code below we use the select() function of the dplyr package to select only the variables “Respondent ID” and “Segment” of a dataframe consisting of the first 6 rows of d.
select(d[1:6,], "Respondent ID", "Segment")
Respondent ID | Segment |
---|---|
1 | 3 |
2 | 3 |
15 | 3 |
17 | 3 |
18 | 1 |
19 | 1 |
Often, we want to use these segment assignments for further analysis in other programs (e.g., in Sawtooth Software Lighthouse Studio market simulations).
An easy way to use the segmentation results in other programs is to just copy and paste them. The code chunk below extract a copy of the dataframe d to the clipboard. After its execution, we can, for example, paste d to a new sheet in MS Excel. We produce a comma separated file by using ‘,’ as the separation argument within the function write.table(). Based on this, it is easy to profile consumer characteristics by segment.
write.table(d, 'clipboard-16384', row.names = FALSE, col.names = TRUE, dec = ".", sep = ",")
In a final step, we want to profile differences in preferences (part worth utilities) by segment. As often in R, many roads lead to Rome. In this illustration we use the CreateContTable() in the tableone package, which easily summarizes continuous variables
In the code chunk below, we first define the names of the variables we want to summarize. Here, we use the column names of d, beginning from the second column (we are not interested in the case IDs) and ending in the penultimate column (we are not interested in the last variable capturing the segment assignments). We save the variables names to a new object ‘Variable_Names’.
Variable_Names <- colnames(d[,2:(ncol(d)-1)])
Afterwards, we apply the CreateContTable() table function and save its returned results to a new object ‘Utilities_by_Segment’. Within the function call we set 4 arguments:
Utilities_by_Segment <- CreateContTable(vars = Variable_Names, strata="Segment" , data = d, test = FALSE )
We can inspect the summary table by executing the next code chunk.
kableone(Utilities_by_Segment, caption = 'Part worth utilities by consumer segment')
0 | 1 | 2 | 3 | |
---|---|---|---|---|
n | 2 | 30 | 19 | 88 |
500 gigabyte (mean (SD)) | -0.32 (0.32) | -17.23 (24.55) | -31.06 (34.80) | -15.19 (20.82) |
1 terabyte (mean (SD)) | 0.32 (0.32) | 17.23 (24.55) | 31.06 (34.80) | 15.19 (20.82) |
black (mean (SD)) | 30.55 (37.72) | 13.51 (28.63) | 5.33 (26.77) | 13.45 (25.10) |
white (mean (SD)) | -30.55 (37.72) | -13.51 (28.63) | -5.33 (26.77) | -13.45 (25.10) |
OneDualShockController (mean (SD)) | 9.12 (57.99) | -49.94 (33.96) | -12.06 (24.94) | -25.13 (27.86) |
TwoDualShockController (mean (SD)) | -9.12 (57.99) | 49.94 (33.96) | 12.06 (24.94) | 25.13 (27.86) |
1 charging station for 2 PS4 controller (mean (SD)) | 68.42 (24.60) | 26.56 (35.86) | -1.92 (34.85) | 15.37 (28.12) |
1 PS4 wireless Stereo Headset 2.0 (mean (SD)) | -41.66 (27.79) | 4.76 (48.66) | 11.51 (32.45) | -14.72 (28.54) |
no accessories (mean (SD)) | -26.76 (3.19) | -31.32 (53.67) | -9.59 (52.21) | -0.66 (27.25) |
Far Cry Primal (mean (SD)) | 63.58 (72.75) | 4.01 (55.75) | 7.79 (64.82) | -4.42 (33.35) |
GTA V (mean (SD)) | -8.85 (48.32) | 56.69 (62.82) | -42.01 (62.61) | 14.97 (36.55) |
Life is strange (mean (SD)) | 35.22 (49.22) | -39.25 (47.75) | 13.09 (57.19) | -6.55 (29.56) |
Tom Clancy’s The Division (mean (SD)) | -52.31 (13.36) | -1.76 (40.03) | 24.34 (65.46) | -3.68 (28.26) |
no action-adventure game (mean (SD)) | -37.63 (87.02) | -19.69 (42.26) | -3.20 (36.88) | -0.32 (30.86) |
The_Witcher_3 (mean (SD)) | -22.27 (89.72) | 8.54 (42.79) | 12.68 (48.69) | -2.16 (31.35) |
Fallout_4 (mean (SD)) | -1.64 (42.42) | 11.65 (36.03) | 1.60 (61.02) | 3.68 (30.78) |
Final_Fantasy (mean (SD)) | 62.30 (33.53) | -29.64 (47.98) | -30.91 (36.84) | -3.22 (31.52) |
Dark_Souls_3 (mean (SD)) | -101.13 (18.71) | 7.37 (34.45) | -2.88 (50.25) | -4.77 (32.22) |
no_role_playing (mean (SD)) | 62.74 (32.49) | 2.07 (53.74) | 19.51 (55.54) | 6.47 (38.76) |
Just_Dance_2016 (mean (SD)) | 54.69 (80.86) | -54.31 (38.36) | -17.45 (47.88) | -13.56 (36.52) |
Guitar_Hero_Live (mean (SD)) | 63.84 (84.38) | -21.60 (57.61) | 36.03 (78.49) | -10.52 (38.98) |
FIFA_16 (mean (SD)) | 22.07 (45.07) | 67.02 (69.53) | -55.31 (34.35) | 23.92 (53.97) |
no_family_companionship (mean (SD)) | -140.61 (210.31) | 8.89 (52.84) | 36.72 (61.42) | 0.16 (37.54) |
PRICE_207.95 (mean (SD)) | 5.56 (4.81) | 55.76 (34.16) | 80.41 (34.22) | 159.86 (33.09) |
PRICE_370.95 (mean (SD)) | 1.83 (1.59) | 18.39 (11.27) | 26.52 (11.28) | 52.72 (10.91) |
PRICE_436.95 (mean (SD)) | 0.32 (0.28) | 3.26 (1.99) | 4.69 (2.00) | 9.33 (1.93) |
PRICE_508.95 (mean (SD)) | -1.32 (1.14) | -13.25 (8.12) | -19.11 (8.13) | -37.99 (7.86) |
PRICE_730.95 (mean (SD)) | -6.40 (5.53) | -64.15 (39.31) | -92.51 (39.37) | -183.92 (38.07) |
NONE (mean (SD)) | 90.26 (23.64) | 91.21 (67.81) | 83.77 (45.61) | -23.18 (45.01) |
Now we can interpret the differences in mean part worth utilities across segments. For example, segment 1 strongly rejects the action adventure GTA V in a PS4 bundle, whereas this game is strongly preferred by segment 2 and segment 3 (your esults might slightly vary because of the non-deterministic character of the segmentation algorithm).
Again, we can copy the summary table (e.g., the means for each group) for further reporting in other software solutions. In order to copy the means of each segments, we first have to get familiar with the structure of the table we have created. We can use the generic str() to create an overview over the table’s structure. I do not execute this function in this script because the output is hefty.
str(Utilities_by_Segment)
However, if you do you will see that the object ‘Utilities_by_Segment’ is made up of a list of 4 named matrices. One matrix for each of the consumer segments (do not forget the noise cluster). Each of the matrices has 29 rows (one for each attribute level in the conjoint study) and 12 columns (one for each summary statistic). The “mean” is saved in at the 4th column.
If you want to assess for example the first view means (limited by the head() function) for the first segment in the object we can call:
knitr::kable(head(Utilities_by_Segment[[1]][, 4]))
x | |
---|---|
500 gigabyte | -0.3151681 |
1 terabyte | 0.3151681 |
black | 30.5461976 |
white | -30.5461976 |
OneDualShockController | 9.1163593 |
TwoDualShockController | -9.1163593 |
You see that the values correspond to the noise cluster. We can assess the fourth column (means) of the second list member by the code below, and so forth…
knitr::kable(head(Utilities_by_Segment[[2]][, 4]))
x | |
---|---|
500 gigabyte | -17.23370 |
1 terabyte | 17.23370 |
black | 13.51053 |
white | -13.51053 |
OneDualShockController | -49.94446 |
TwoDualShockController | 49.94446 |
Thus, in R, we assess list members by the ‘[[]]’ notation. By appending the ‘[]’ notion, we can assess the elements of specific list members.
Next, we use this notation to create a copy-ready table of means for each consumer segment. In the code chunk below, we first establish a new dataframe ‘Segment_Means’ containing solely the mean utilities of the noise segment (the first list member). We, subsequently, use a for loop over the remaining list members to add the mean utilities of the remaining consumer segments via the generic cbind() function. This function simply appends new variables to the dataframe ‘Segment_Means’.
Finally, we add variable (column) names to the dataframe. Here we use the seq() function with 3 arguments.
Segment_Means <- as.data.frame(Utilities_by_Segment[[1]][,4])
for (i in 2:length(Utilities_by_Segment)) {
Segment_Means <- cbind(Segment_Means, Utilities_by_Segment[[i]][,4])
}
names(Segment_Means) <- seq(1, length(Utilities_by_Segment), 1)
We can finally copy the table with the segment means to the clipboard (as done above).
write.table(Segment_Means, 'clipboard-16384', row.names = TRUE, col.names = TRUE, dec = ".", sep = ",")
Johnson, R. M. (1975). A simple method for pairwise monotone regression. Psychometrika, 40(2), 163–168. doi: 10.1007/BF02291563
Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data: An introduction to cluster analysis. Hoboken NJ: Wiley-Interscience.
Orme, B. K., & Chrzan, K. (2017). Becoming an expert in conjoint analysis: Choice modeling for pros. Orem, UT: Sawtooth Software.
Reynolds, A. P., Richards, G., La Iglesia, B. de, & Rayward-Smith, V. J. (2006). Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms, 5(4), 475–504. doi: 10.1007/s10852-005-9022-1
Sarstedt, M., & Mooi, E. (2019). A concise guide to market research: The process, data, and methods using ibm spss statistics (3. ed.). Berlin - Heidelberg: Springer.
Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30(3), 305–314. doi: 10.1177/002224379303000303
Vigneau, E., Chen, M., & Qannari, E. M. (2015). ClustVarLV: An r package for the clustering of variables around latent variables. The R Journal, 7(2), 134–148.
Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics - Simulation and Computation, 32(4), 1131–1150. doi: 10.1081/SAC-120023882
Vigneau, E., Qannari, E. M., Navez, B., & Cottet, V. (2016). Segmentation of consumers in preference studies while setting aside atypical or irrelevant consumers. Food Quality and Preference, 47(January), 54–63. doi: 10.1016/j.foodqual.2015.02.008
Vigneau, E., Qannari, E. M., Punter, P. H., & Knoops, S. (2001). Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference, 12(5-7), 359–363. doi: 10.1016/S0950-3293(01)00025-8
Wedel, M., & Kamakura, W. A. (2012). Market segmentation: Conceptual and methodological foundations (2nd ed.). Springer Science & Business Media.
(There are also specialized tests that assess the difference between sets of part worth utilities for different consumer groups. One example is the Swait and Louviere Test for Between-Group Differences, Swait & Louviere (1993))↩︎