Connect with me on Open Science Framework | Contact me via LinkedIn

R analysis script for presenting one solution on how to conduct a benefit segmentation on individual part worth utilities elicited by conjoint analysis (an ACBC in this case). Segmentation uses the simple example we discussed during class (Chapter 6, ACBC on PlayStation 4 bundles). Ideally, part worth utilities are estimated through monotone regression (Johnson, 1975) instead of Hierarchical Bayes to not shrink individual utilities toward population means (for this illustration we use HB utilities). Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ is located in your RStudio project folder. It contains the utilities of n=139 consumers that did participate in the ACBC study.

As discussed during class, the consumer segmentation procedure spreads over 6 steps:

Exporting zero-centered part-worth utilities to SPSS/Excel
Importing utilities to R.
Choosing an appropriate number of consumer segments
Detecting and excluding multivariate outliers
Conduction cluster analysis/ consolidation of segment solution
Interpretation of consumer segments

Within the analysis script we cover the steps 2-6.

Here is the r session info:

sessionInfo()

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0 yaml_2.2.0      Rcpp_1.0.1      stringi_1.4.3   rmarkdown_2.0   knitr_1.26      stringr_1.4.0   xfun_0.11       digest_0.6.23   rlang_0.4.0     evaluate_0.14

1 Load packages that will be used throughout the analysis

Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.

In R most functionality is provided by additional packages.
Most of the packages are well documented, See: https://cran.r-project.org/

The code below evaluates if the packages we will use in this chapter (e.g., readxl) are still installed to your machine. If yes, the corresponding package will be loaded via the library() function. If not, R will install the package (install.packages() and than load it.

if ("readxl" %in% rownames(installed.packages())) {
  suppressPackageStartupMessages(library(readxl))
} else {
  install.packages("readxl", repos = "https://cloud.r-project.org")
}

if ("ClustVarLV" %in% rownames(installed.packages())) {
  suppressPackageStartupMessages(library(ClustVarLV))
} else {
  install.packages("ClustVarLV", repos = "https://cloud.r-project.org")
}

if ("tableone" %in% rownames(installed.packages())) {
  suppressPackageStartupMessages(library(tableone))
} else {
  install.packages("tableone", repos = "https://cloud.r-project.org")
}

if ("dplyr" %in% rownames(installed.packages())) {
  suppressPackageStartupMessages(library(dplyr))
} else {
  install.packages("dplyr", repos = "https://cloud.r-project.org")
}

2 Import the *.xlsx file from Excel

Make sure that the file ‘Utilities PS4 ACBC Zero Centered.xlsx’ (see E-Learning platform) is located at your RStudio project’s directory.

Save the file into a new dataframe called d

R is a functional language, meaning we could define a function that takes some arguments as input and delivers a result as an output.

We can see which arguments a function understands by pressing ‘F1’ while setting the cursor into the function’s name (i.e., read_excel() below).

In the code chunk below, we hand-in the file name as well as the argument sheet = 1 to the read_excel() function.
In this way we tell the function that it should import that data of the first spreadsheet in the file.

To execute the call, just set the cursor somewhere in the corresponding code line and press ‘Ctr’+‘Enter’.
If you encounter any problems with the read functions, please try to copy your *xlsx-file to the desktop and then use:

‘d<-as.data.frame(read_excel(file.choose(), sheet = 1))’.
This will open a dialogue that allows for a manual selection of the file.

d<-as.data.frame(read_excel("Utilities PS4 ACBC Zero Centered.xlsx", sheet = 1))

Now, the object d (a dataframe) became part of our working environment. We can access the dataframe.

For Example we can view the dataframe d via the function View(). We can also preview the first part of the data set by head()

The code below returns the first few rows ans columns of d.

View(head(d[,1:6]))

Brief Overview
Respondent ID	500 gigabyte	1 terabyte	black	white	OneDualShockController
1	-41.1005467	41.1005467	23.0185427	-23.0185427	11.580779
2	-40.9207006	40.9207006	35.7039431	-35.7039431	-29.301277
15	-2.2086182	2.2086182	30.2616954	-30.2616954	-14.598786
17	-0.0248455	0.0248455	0.4503716	-0.4503716	-9.486536
18	-6.0335218	6.0335218	18.4052887	-18.4052887	-18.882059
19	-1.5296434	1.5296434	1.2160663	-1.2160663	-61.038314

We see that each row corresponds to one consumer. The first column contains a unique case ID, whereas the rest of the columns presents the individual part worth utilities corresponding to the attribute levels involved in the conjoint study.

3 Choosing an appropriate number of consumer segments

In literature, countless algorithms exist on how to segment consumers (e.g., Kaufman & Rousseeuw, 2005; Reynolds, Richards, La Iglesia, & Rayward-Smith, 2006; Sarstedt & Mooi, 2019; Wedel & Kamakura, 2012). In this illustration we will use only one approach: Clustering of Variables Around Latent Variables (Vigneau & Qannari, 2003; Vigneau, Qannari, Punter, & Knoops, 2001). We do so mainly for 2 reasons.

Nice implementation in R, for example in the ClustVarLV package (Vigneau, Chen, & Qannari, 2015).
Implements Steps 3-6 in easy-to-use functions (Vigneau, Qannari, Navez, & Cottet, 2016).

We discuss the algorithms behind CLV during class.

3.1 Data transformations

The functions provided in the package ClustVarLV expect our data set in a layout where consumers are columns and part-worth utilities are rows. Thus, we have to transpose our data set. This is dome by the t() function (see code chunk below ).

In the code chunk we transpose (flip rows and columns) all columns of d except for the first one, which contains the case IDs. We assign the results to a new dataframe named d_transposed.

d_transposed<-as.data.frame(t(d[,2:ncol(d)]))

Next, we want to assign consumer IDs as column names in d_transposed. This is done below.

colnames(d_transposed)<-d$`Respondent ID`

We can review the first few columns and rows of the resultant dataframe.

View(head(d_transposed[,1:6]))

Brief Overview
	1	2	15	17	18	19
500 gigabyte	-41.10055	-40.92070	-2.208618	-0.0248455	-6.033522	-1.529643
1 terabyte	41.10055	40.92070	2.208618	0.0248455	6.033522	1.529643
black	23.01854	35.70394	30.261695	0.4503716	18.405289	1.216066
white	-23.01854	-35.70394	-30.261695	-0.4503716	-18.405289	-1.216066
OneDualShockController	11.58078	-29.30128	-14.598786	-9.4865363	-18.882059	-61.038314
TwoDualShockController	-11.58078	29.30128	14.598786	9.4865363	18.882059	61.038314

3.2 An initial segmentation solution to determine number of segments

The next thing we do is choosing an appropriate number of segments. To do so, we use a hierarchical CLV approach which is then optimized by the kmeans-like algorithm outlined during class.

Hint: press F1 within functions to become familiar the expected arguments and returned results.

In the code chunk below, we use the CLV() function with 5 arguments.

x: the transposed dataframe containing only the individual part worth utilities.
method: ‘local’ means that we use the covariance/correlation between cases is used as a measure of proximity. This is exactly what we want.
maxiter: At maximum 200 iteration steps are allowed for the consolidation/post-hoc optimization of the hierarchical clustering solution by the kmeans-like algorithm.
nmax: Here 15 is the maximum number of consumer segments that will be considered.
sX: Whether to z-standardize the columns of d_transposed. Since, in our case, the variance of each column conveys an important information about an individual’s utility function, we turn-off standardization.

We save the results of the segmentation to an object with the name PS4_hierarchical_CLV.

PS4_hierarchical_CLV<- CLV(X = d_transposed, method = "local", maxiter = 200, nmax = 15,  sX=FALSE)

Think back to our lecture. Within the CLV-approach, we try to maximize the criterion S (in an essence, we try to maximize the covariance of each consumer profile with the mean profile that a consumer is assigned to, the segment’s centroid). Since, we have used a hierarchical version of CLV, we can use a dendrogram of S to decide on an appropriate number of consumer segments.

plot(PS4_hierarchical_CLV, "dendrogram")

Interpretation: The interpretation is the same as for every other dendrogram. The x-axis shows each participant. The y-axis shows how much deterioration in S can be seen when successively merging cases until only one global consumer segment is left. We can see that the most pronounced drops in S occur at the steps where 4 segments are merged to 3 segments, and where 3 segments are merged into 2 segments.

The ClustVarLV package provides an even better visualization, the so-called ‘delta plot’, which visualizes the deterioration in S across difference numbers of consumer segments after the additional consolidation stage that follows the hierarchical CLV-approach.

plot(PS4_hierarchical_CLV, "delta")

Interpretation: The delta plot enables for a clearer decision in favor of 3 consumer segments (because the drop in S is relatively pronounced when moving from 3 to 2 segments.

Its up to you to decide on the appropriate number of consumer segments. For example, you may consider

Interpretability in terms of distinct preference profiles and/or consumer characteristics.
Client preferences.
Profitability of special niche segments.
Targeted width of product assortments.

We can, additionally, validate how much the segment solution changes when moving from a 3 to a 2 segment solution.

In the below code chunk we combine two functions table() and get_partition(). The table() function, in our case, uses two variables with the case-wise segment labels as argument and, then, builds a crosstab of both vectors. The first vector is given by the hierarchical CLV solution drawing on 2 consumer segments (get_partition(PS4_hierarchical_CLV,K=2)). To extract the vector with the segment labels, we use the get_partition() function of the ClustVarLV package, which expects a CLV solution and the number of segments as arguments. The second vector is given by the hierarchical CLV solution drawing on 3 consumer segments (get_partition(PS4_hierarchical_CLV,K=3)).

table(get_partition(PS4_hierarchical_CLV,K=2), get_partition(PS4_hierarchical_CLV,K=3))

Switches from a 2 to a 3 segment solution
	1	2	3
1	82	0	21
2	0	31	5

Interpretation: For 3 segments, even the smallest segment contains enough consumers (26/139: 19%). In the present case, we stay with K=3.

4 Detecting an threatening outlier cases/ consolidation of segment solution

Next, we evaluate if we should exclude cases that are multivariate outliers, which are “observations that differ substantially from other observations in respect of one or more characteristics.” (Sarstedt & Mooi, 2019, p. 386). Usually, these cases are excluded from segmentation because they are assumed to disturb the clustering process as well as the subsequent interpretation of consumer segments.

The ClustVarLV package provide a function to automatically set aside affected cases.

CLV provides different strategies on outlier treatment (see the documentation of the ClustVarLV package). For this illustration, we use a ‘kplusone’ approach. This strategy simply sets aside atypical outlier consumers to an own ‘noise’ segment (we discussed the algorithm during class).

Usually, we expect a consumer j to be allocated to a group G_k when the correlation \(\rho\) between the consumer’s utilities x_j and the centroid c_k is high and positive.

The ‘kplusone’ approach simply imposes a lower threshold for \(\rho\). If for a consumer j, the correlation \(\rho\)=Cor(x_j, c_k) fails to exceed this threshold for any consumer segment centroid, then this consumer is set aside to the noise cluster. Choosing a threshold for \(\rho\) is arbitrary, but often 0.3 is used (Vigneau et al., 2016).

In the code chunk below, we apply the CLV_kmeans() function instead of CLV(). CLV_kmeans() dispenses the hierarchical clustering, and instead uses nstart random starts in the kmeans-like algorithm, discussed during lecture, to search for the run with the highest resulting target value of S. Thus, outlier detection, treatment and consolidation of segmentation solution is combined into one function. That is why I like the package for introductory purposes.

CLV_kmeans(), in this case, uses 7 arguments.

x: the transposed dataframe containing only the individual part worth utilities.
method: ‘local’ means that we use the covariance/correlation between cases is used as a measure of proximity. This is exactly what we want.
iter.max: At maximum 200 iteration steps are allowed in the kmeans-like algorithm.
clust: The number of consumer segments to be extracted.
sX: Whether to z-standardize the columns of d_transposed. Since, in our case, the variance of each column conveys an important information about an individual’s utility function, we turn-off standardization.
strategy: here we specify the way the algorithm treats outliers (“kplusone” in our case).
rho: The lower threshold for \(\rho\).

We save the results of the segmentation to an object with the name PS4_kmeans_PlusOne_CLV.

PS4_kmeans_PlusOne_CLV<-CLV_kmeans(X = d_transposed, method = "local", iter.max = 200, clust = 3, nstart = 200, sX=FALSE, strategy="kplusone" ,rho = 0.3)

Next, we can briefly summarize the consolidated results.

summary(PS4_kmeans_PlusOne_CLV)

## $number
## noise cluster             1             2             3 
##             2            30            19            88 
## 
## $groups
## $groups[[1]]
##     cor in group  cor next group
## 35          0.85            0.51
## 163         0.83            0.68
## 74          0.80            0.54
## 86          0.78            0.51
## 44          0.77            0.59
## 55          0.76            0.52
## 29          0.75            0.44
## 191         0.74            0.60
## 78          0.70            0.58
## 96          0.70            0.63
## 129         0.70            0.42
## 19          0.69            0.26
## 75          0.69            0.60
## 131         0.69            0.51
## 178         0.69            0.63
## 212         0.69            0.61
## 103         0.68            0.63
## 132         0.68            0.10
## 170         0.67            0.63
## 62          0.64            0.13
## 69          0.64            0.53
## 18          0.63            0.50
## 193         0.62            0.36
## 41          0.61            0.43
## 31          0.59            0.13
## 114         0.55            0.27
## 88          0.51            0.27
## 175         0.50            0.43
## 36          0.45            0.14
## 99          0.40            0.13
## 
## $groups[[2]]
##     cor in group  cor next group
## 176         0.80            0.42
## 43          0.79            0.50
## 54          0.79            0.56
## 57          0.79            0.56
## 186         0.76            0.68
## 46          0.73            0.48
## 174         0.72            0.57
## 65          0.69            0.37
## 171         0.69            0.52
## 106         0.64            0.49
## 194         0.64            0.56
## 27          0.62            0.50
## 87          0.62            0.33
## 47          0.59            0.53
## 64          0.56            0.03
## 130         0.47            0.13
## 210         0.46            0.22
## 56          0.40            0.30
## 34          0.36            0.14
## 
## $groups[[3]]
##     cor in group  cor next group
## 148         0.98            0.58
## 200         0.97            0.50
## 154         0.96            0.63
## 160         0.95            0.56
## 198         0.95            0.66
## 42          0.94            0.57
## 165         0.94            0.58
## 139         0.93            0.50
## 180         0.93            0.50
## 184         0.93            0.56
## 189         0.93            0.59
## 195         0.93            0.65
## 209         0.93            0.47
## 214         0.93            0.54
## 15          0.92            0.55
## 71          0.92            0.64
## 89          0.92            0.49
## 167         0.92            0.45
## 202         0.92            0.56
## 40          0.91            0.56
## 92          0.91            0.66
## 95          0.91            0.45
## 144         0.91            0.58
## 150         0.91            0.50
## 151         0.91            0.62
## 204         0.91            0.50
## 32          0.90            0.65
## 70          0.90            0.58
## 91          0.90            0.61
## 120         0.90            0.57
## 136         0.90            0.59
## 159         0.90            0.47
## 197         0.90            0.64
## 104         0.89            0.54
## 109         0.89            0.56
## 142         0.89            0.63
## 147         0.89            0.43
## 192         0.88            0.59
## 208         0.88            0.55
## 30          0.87            0.73
## 73          0.87            0.75
## 172         0.87            0.55
## 188         0.87            0.61
## 190         0.87            0.76
## 196         0.87            0.51
## 213         0.87            0.56
## 33          0.86            0.64
## 63          0.86            0.47
## 177         0.86            0.55
## 182         0.86            0.74
## 203         0.86            0.49
## 53          0.85            0.56
## 1           0.84            0.60
## 90          0.84            0.55
## 153         0.84            0.59
## 211         0.84            0.63
## 28          0.83            0.65
## 116         0.83            0.59
## 162         0.83            0.69
## 181         0.83            0.65
## 119         0.81            0.60
## 135         0.81            0.55
## 156         0.81            0.80
## 164         0.80            0.47
## 115         0.79            0.30
## 169         0.79            0.51
## 107         0.78            0.74
## 2           0.77            0.70
## 38          0.77            0.40
## 207         0.77            0.44
## 149         0.76            0.53
## 118         0.74            0.66
## 168         0.74            0.60
## 183         0.74            0.28
## 166         0.72            0.57
## 173         0.72            0.68
## 79          0.70            0.60
## 17          0.69            0.62
## 45          0.69            0.66
## 77          0.69            0.57
## 81          0.68            0.17
## 58          0.67            0.60
## 140         0.67            0.62
## 187         0.65            0.47
## 141         0.62            0.49
## 85          0.60            0.47
## 133         0.60            0.43
## 138         0.59            0.46
## 
## 
## $set_aside
##  39 201 
##  18 128 
## 
## $cormatrix
##       Comp1 Comp2 Comp3
## Comp1  1.00  0.44  0.55
## Comp2  0.44  1.00  0.60
## Comp3  0.55  0.60  1.00

Interpretation: The summary() function provides us with a lot of useful information. First we see segment sizes after outlier treatment and consolidation. In our example, 2 consumers are set aside, because of their atypical preference patterns (results may slightly differ in your analysis run because of the non-deterministic properties of the algorithm). At the bottom of the output we see that these consumers are those with the ID’s 18 and 39. Furthermore, for each of the 3 consumer segments we see the corresponding ID’s, the correlation to the assigned segment centroid, as well as the highest correlation to each of the other consumer segments. Our goal should be to find a segment solution where for each consumer the correlation with the own segment centroid is much higher than the correlation with other segments’ centroids. The final part of the output shows how strong the correlation is between the centroids (‘Comp1’ to ‘Comp3’) of different consumer segments. The lower the correlation between two segments, the more dissimilar are the part worth utilities of the consumers segments. Thus, in our example, consumer segment 1 is slightly more similar to segment 2 as compared to segment 3.

The ClustVarLV package, additional, provides an intuitive graphical depiction of segment similarity based on principle component analysis (PCA, see other lectures such as Marketing Methods and Analysis, or Sarstedt & Mooi (2019), p. 259).

The code below generates segment-wise loading plots of each consumer using the first 2 principal components of the PCA. We use the function plot_var(), which expects a segment solution produced by the functions in the ClustVarLV package as arguments.

plot_var(PS4_kmeans_PlusOne_CLV, beside = TRUE)

Interpretation: We start with the plot for consumers segment 1 (blue). The plot presents the loadings of each consumers in segment 1 on the first 2 principal components. We can extract several information. Vectors in this plot that are very close to each other in term of their angle represents consumers that have very similar part worth utility profile across attribute levels. Vectors that point into opposite directions would represent consumers whose preferences are very different, yet attribute levels preferred by one consumer would have been rejected by the other. Vectors that are orthogonal to each other represent consumers that have different preferences regarding different attribute levels. The x-axis presents the loadings regarding the first principal component (PC). The y-axis visualizes the loadings of the second PC. We see that across all consumers the first PC explains approx. 53% of the variation in part worth utilities, while the second PC explains additional 8%. Thus, the visual depiction is a simplification as it does not explain 39% of the variation in preferences. We further can see that some consumers are better explained by the loading plot (long vectors) as compared to others (short vectors). Furthermore, when comparing the plots of different consumer segments, it becomes clear that segment 1 is, by tendency, more close to segment 2 as compared to segment 3, because vectors of 1 and 2, by tendency, point into the same direction. The last plot in the output is devoted to the consumers in the noise cluster. As we can see, the preference vectors of these consumers are dissimilar to those of other segments and, overall, not well explained by the PCA.

5 Interpretation of consumer segments

Next, we make more sense out of the 3 revealed consumer segments. Usually, this is done by:

Profiling consumer characteristics by segment (e.g., age, gender, income group) <- show marketing which consumers to address.
Profiling differences in preferences (part-worth utilities) by segment <- show marketing how to address consumers in distinct segments.

The very first thing we want to know is, which consumer belongs to which of the 3 segments.

The code chunk below applies the get_partition() function the ClustVarLV package provides to the obtained segment solution (‘PS4_kmeans_PlusOne_CLV’). The call saves the segment assignments of each consumers as a new variable in the initial dataframe d.

d$Segment <- get_partition(PS4_kmeans_PlusOne_CLV)

We can preview the results. In the code below we use the select() function of the dplyr package to select only the variables “Respondent ID” and “Segment” of a dataframe consisting of the first 6 rows of d.

select(d[1:6,], "Respondent ID", "Segment")

Brief Overview
Respondent ID	Segment
1	3
2	3
15	3
17	3
18	1
19	1

Often, we want to use these segment assignments for further analysis in other programs (e.g., in Sawtooth Software Lighthouse Studio market simulations).

An easy way to use the segmentation results in other programs is to just copy and paste them. The code chunk below extract a copy of the dataframe d to the clipboard. After its execution, we can, for example, paste d to a new sheet in MS Excel. We produce a comma separated file by using ‘,’ as the separation argument within the function write.table(). Based on this, it is easy to profile consumer characteristics by segment.

write.table(d, 'clipboard-16384', row.names = FALSE, col.names = TRUE, dec = ".", sep = ",")

In a final step, we want to profile differences in preferences (part worth utilities) by segment. As often in R, many roads lead to Rome. In this illustration we use the CreateContTable() in the tableone package, which easily summarizes continuous variables

In the code chunk below, we first define the names of the variables we want to summarize. Here, we use the column names of d, beginning from the second column (we are not interested in the case IDs) and ending in the penultimate column (we are not interested in the last variable capturing the segment assignments). We save the variables names to a new object ‘Variable_Names’.

Variable_Names <- colnames(d[,2:(ncol(d)-1)])

Afterwards, we apply the CreateContTable() table function and save its returned results to a new object ‘Utilities_by_Segment’. Within the function call we set 4 arguments:

vars: the names of the variables to summarize (Variable_Names)
strata: The strata criteria to apply in the summary table. The table will provide summary statistics (e.g., means and SD) for each level of the strata variables.
data: our initial dataframe d.
test: we explicitly set test=False, because we do not want to test for significant differences between groups of consumers drawing on classical null-hypothesis tests (see our discussion on Bayesian tests in Chapter 4, and in particular Orme & Chrzan (2017), p. 185)¹.

Utilities_by_Segment <- CreateContTable(vars = Variable_Names,  strata="Segment" , data = d, test = FALSE )

We can inspect the summary table by executing the next code chunk.

kableone(Utilities_by_Segment, caption = 'Part worth utilities by consumer segment')

Part worth utilities by consumer segment
	0	1	2	3
n	2	30	19	88
500 gigabyte (mean (SD))	-0.32 (0.32)	-17.23 (24.55)	-31.06 (34.80)	-15.19 (20.82)
1 terabyte (mean (SD))	0.32 (0.32)	17.23 (24.55)	31.06 (34.80)	15.19 (20.82)
black (mean (SD))	30.55 (37.72)	13.51 (28.63)	5.33 (26.77)	13.45 (25.10)
white (mean (SD))	-30.55 (37.72)	-13.51 (28.63)	-5.33 (26.77)	-13.45 (25.10)
OneDualShockController (mean (SD))	9.12 (57.99)	-49.94 (33.96)	-12.06 (24.94)	-25.13 (27.86)
TwoDualShockController (mean (SD))	-9.12 (57.99)	49.94 (33.96)	12.06 (24.94)	25.13 (27.86)
1 charging station for 2 PS4 controller (mean (SD))	68.42 (24.60)	26.56 (35.86)	-1.92 (34.85)	15.37 (28.12)
1 PS4 wireless Stereo Headset 2.0 (mean (SD))	-41.66 (27.79)	4.76 (48.66)	11.51 (32.45)	-14.72 (28.54)
no accessories (mean (SD))	-26.76 (3.19)	-31.32 (53.67)	-9.59 (52.21)	-0.66 (27.25)
Far Cry Primal (mean (SD))	63.58 (72.75)	4.01 (55.75)	7.79 (64.82)	-4.42 (33.35)
GTA V (mean (SD))	-8.85 (48.32)	56.69 (62.82)	-42.01 (62.61)	14.97 (36.55)
Life is strange (mean (SD))	35.22 (49.22)	-39.25 (47.75)	13.09 (57.19)	-6.55 (29.56)
Tom Clancy’s The Division (mean (SD))	-52.31 (13.36)	-1.76 (40.03)	24.34 (65.46)	-3.68 (28.26)
no action-adventure game (mean (SD))	-37.63 (87.02)	-19.69 (42.26)	-3.20 (36.88)	-0.32 (30.86)
The_Witcher_3 (mean (SD))	-22.27 (89.72)	8.54 (42.79)	12.68 (48.69)	-2.16 (31.35)
Fallout_4 (mean (SD))	-1.64 (42.42)	11.65 (36.03)	1.60 (61.02)	3.68 (30.78)
Final_Fantasy (mean (SD))	62.30 (33.53)	-29.64 (47.98)	-30.91 (36.84)	-3.22 (31.52)
Dark_Souls_3 (mean (SD))	-101.13 (18.71)	7.37 (34.45)	-2.88 (50.25)	-4.77 (32.22)
no_role_playing (mean (SD))	62.74 (32.49)	2.07 (53.74)	19.51 (55.54)	6.47 (38.76)
Just_Dance_2016 (mean (SD))	54.69 (80.86)	-54.31 (38.36)	-17.45 (47.88)	-13.56 (36.52)
Guitar_Hero_Live (mean (SD))	63.84 (84.38)	-21.60 (57.61)	36.03 (78.49)	-10.52 (38.98)
FIFA_16 (mean (SD))	22.07 (45.07)	67.02 (69.53)	-55.31 (34.35)	23.92 (53.97)
no_family_companionship (mean (SD))	-140.61 (210.31)	8.89 (52.84)	36.72 (61.42)	0.16 (37.54)
PRICE_207.95 (mean (SD))	5.56 (4.81)	55.76 (34.16)	80.41 (34.22)	159.86 (33.09)
PRICE_370.95 (mean (SD))	1.83 (1.59)	18.39 (11.27)	26.52 (11.28)	52.72 (10.91)
PRICE_436.95 (mean (SD))	0.32 (0.28)	3.26 (1.99)	4.69 (2.00)	9.33 (1.93)
PRICE_508.95 (mean (SD))	-1.32 (1.14)	-13.25 (8.12)	-19.11 (8.13)	-37.99 (7.86)
PRICE_730.95 (mean (SD))	-6.40 (5.53)	-64.15 (39.31)	-92.51 (39.37)	-183.92 (38.07)
NONE (mean (SD))	90.26 (23.64)	91.21 (67.81)	83.77 (45.61)	-23.18 (45.01)

Now we can interpret the differences in mean part worth utilities across segments. For example, segment 1 strongly rejects the action adventure GTA V in a PS4 bundle, whereas this game is strongly preferred by segment 2 and segment 3 (your esults might slightly vary because of the non-deterministic character of the segmentation algorithm).

Again, we can copy the summary table (e.g., the means for each group) for further reporting in other software solutions. In order to copy the means of each segments, we first have to get familiar with the structure of the table we have created. We can use the generic str() to create an overview over the table’s structure. I do not execute this function in this script because the output is hefty.

str(Utilities_by_Segment)

However, if you do you will see that the object ‘Utilities_by_Segment’ is made up of a list of 4 named matrices. One matrix for each of the consumer segments (do not forget the noise cluster). Each of the matrices has 29 rows (one for each attribute level in the conjoint study) and 12 columns (one for each summary statistic). The “mean” is saved in at the 4th column.

If you want to assess for example the first view means (limited by the head() function) for the first segment in the object we can call:

knitr::kable(head(Utilities_by_Segment[[1]][, 4]))

	x
500 gigabyte	-0.3151681
1 terabyte	0.3151681
black	30.5461976
white	-30.5461976
OneDualShockController	9.1163593
TwoDualShockController	-9.1163593

You see that the values correspond to the noise cluster. We can assess the fourth column (means) of the second list member by the code below, and so forth…

knitr::kable(head(Utilities_by_Segment[[2]][, 4]))

	x
500 gigabyte	-17.23370
1 terabyte	17.23370
black	13.51053
white	-13.51053
OneDualShockController	-49.94446
TwoDualShockController	49.94446

Thus, in R, we assess list members by the ‘[[]]’ notation. By appending the ‘[]’ notion, we can assess the elements of specific list members.

Next, we use this notation to create a copy-ready table of means for each consumer segment. In the code chunk below, we first establish a new dataframe ‘Segment_Means’ containing solely the mean utilities of the noise segment (the first list member). We, subsequently, use a for loop over the remaining list members to add the mean utilities of the remaining consumer segments via the generic cbind() function. This function simply appends new variables to the dataframe ‘Segment_Means’.

Finally, we add variable (column) names to the dataframe. Here we use the seq() function with 3 arguments.

1 specifies that we start a sequence of numbers with 1
length(Utilities_by_Segment) tells the function to stop at the number of consumer segments in ‘Utilities_by_Segment’
The final 1 tell the function to step forward in steps of 1. So we allocate the names 1, 2, 3, 4 to the dataframe ‘Segment_Means’.

Segment_Means <- as.data.frame(Utilities_by_Segment[[1]][,4])

for (i in 2:length(Utilities_by_Segment)) {
  
  Segment_Means <- cbind(Segment_Means, Utilities_by_Segment[[i]][,4])
  
}

names(Segment_Means) <- seq(1, length(Utilities_by_Segment), 1)

We can finally copy the table with the segment means to the clipboard (as done above).

write.table(Segment_Means, 'clipboard-16384', row.names = TRUE, col.names = TRUE, dec = ".", sep = ",")

List of References

Johnson, R. M. (1975). A simple method for pairwise monotone regression. Psychometrika, 40(2), 163–168. doi: 10.1007/BF02291563

Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data: An introduction to cluster analysis. Hoboken NJ: Wiley-Interscience.

Orme, B. K., & Chrzan, K. (2017). Becoming an expert in conjoint analysis: Choice modeling for pros. Orem, UT: Sawtooth Software.

Reynolds, A. P., Richards, G., La Iglesia, B. de, & Rayward-Smith, V. J. (2006). Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms, 5(4), 475–504. doi: 10.1007/s10852-005-9022-1

Sarstedt, M., & Mooi, E. (2019). A concise guide to market research: The process, data, and methods using ibm spss statistics (3. ed.). Berlin - Heidelberg: Springer.

Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research, 30(3), 305–314. doi: 10.1177/002224379303000303

Vigneau, E., Chen, M., & Qannari, E. M. (2015). ClustVarLV: An r package for the clustering of variables around latent variables. The R Journal, 7(2), 134–148.

Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent components. Communications in Statistics - Simulation and Computation, 32(4), 1131–1150. doi: 10.1081/SAC-120023882

Vigneau, E., Qannari, E. M., Navez, B., & Cottet, V. (2016). Segmentation of consumers in preference studies while setting aside atypical or irrelevant consumers. Food Quality and Preference, 47(January), 54–63. doi: 10.1016/j.foodqual.2015.02.008

Vigneau, E., Qannari, E. M., Punter, P. H., & Knoops, S. (2001). Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference, 12(5-7), 359–363. doi: 10.1016/S0950-3293(01)00025-8

Wedel, M., & Kamakura, W. A. (2012). Market segmentation: Conceptual and methodological foundations (2nd ed.). Springer Science & Business Media.

(There are also specialized tests that assess the difference between sets of part worth utilities for different consumer groups. One example is the Swait and Louviere Test for Between-Group Differences, Swait & Louviere (1993))↩︎

OCR_Chapter6_Two_Stage_Benefit_Segmentation_ACBC

Asst. Prof. Dr. Marcel Lichters

2019-12-20