# Conjoint Analysis & Segmentation

In microeconomics, the measurement of consumers’ preferences is one of the most important elements of marketing research. It helps to explain the reasons of consumers’ decisions. Using some statistical methods it is possible to quantify preferences and answer the question: what product will the consumer choose?

Some of the more important issues for which modern conjoint analysis is used are the following:

1. Predicting the market share of a proposed new product, given the current offerings of competitors

2. Predicting the impact of a new competitive product on the market share of any given product in the marketplace

3. Determining consumers’ willingness to pay for a proposed new product

4. Quantifying the tradeoffs customers or potential customers are willing to make among the various attributes or features that are under consideration in the new product design

### Experimental Design Considerations

Define product attributes that can be quantified and corresponding levels. A level is the specific value or realization of the attribute.

Part-worths are generated by OLS Regression. OLS is the method of calculation traditionally used in most conjoint studies. However, OLS is not appropriate for conjoint data consisting of rank orders.

For OLS to be appropriate, we must assume the data are “scaled at the interval level.” By this, we mean that the data are scaled so that real differences in the things being measured are communicated by the arithmetic differences in their values. Fahrenheit temperature, for instance, has an interval scale. The difference between 70 and 80 degrees is exactly as large as the difference between 80 and 90 degrees. In the social sciences and in marketing research we are usually willing to assume that rating scale values possess this kind of scaling.

Some important notes on experimental design:

1. The more tangible and understandable the levels of each attribute are to the respondents, the more valid the results of the research will be. For example, attribute levels such as “really roomy” are vague, meaning different things to different people, and should be avoided.

2. 2. The greater the number of attribute levels to be tested, the more data that will be needed to achieve the same degree of output accuracy.

3. For quantitative variables (price and horsepower, in this example), the greater the distance between any two consecutive levels, the harder it will be to get a good idea of how a consumer might evaluate something in between the two (e.g., \$24,000).

4. The Number of Levels Effect: Both a psychological and algorithmic effect, the number of levels specified can influence inferred attribute importance. This can introduce systematic bias.

### Understanding the Output

The basic results of a conjoint analysis are the estimated attribute-level utilities. Keeping with the example in Table 1, conjoint output might look like the output shown below:

Within a given attribute, the estimated utilities are generally scaled in such a way that they add up to zero. So a negative number does not mean that a given level has “negative utility”; it just means that this level is on average less preferred than a level with an estimated utility that is positive.

T-Value Interpretation:

Because of the way conjoint utilities are scaled, the standard interpretation of t-values can yield misleading results. For example, the level “Saturn” of the attribute “Brand” has a t-value of 0.87. In general, a t-value of this magnitude would fail a test of statistical significance; however, this t-value is generated because within the attribute “Brand,” the level “Saturn” has neither a very high nor very low relative preference. It is basically in the middle in terms of overall preference. Because of the scaling, levels that have more moderate levels of preference within a given attribute are likely to have estimated utilities close to zero, which tends to produce very low t-values (recall that the t-test is measuring the probability that the true value of a parameter is not different from zero).

Note:

At a practical level, it is rare that an attribute will not be significant, and, if you find one that is, it means it probably should not have been included in the experimental design in the first place, because respondents are not considering that attribute’s information when they make choices.

### Example

library(conjoint)
data(tea)
#calculating the model for the first respondent
#caModel(y = vector of single profile of preferences, x = matrix of profiles)
caModel(y=tprefm[1,], x=tprof)
##
## Call:
## lm(formula = frml)
##
## Residuals:
##       1       2       3       4       5       6       7       8       9
##  1.1345 -1.4897  0.3103 -0.2655  0.3103  0.1931  1.5931 -1.4310 -1.4310
##      10      11      12      13
##  1.1207  0.3690  1.1931 -1.6069
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)          3.3937     0.5439   6.240  0.00155 **
## factor(x\$price)1    -1.5172     0.7944  -1.910  0.11440
## factor(x\$price)2    -1.1414     0.6889  -1.657  0.15844
## factor(x\$variety)1  -0.4747     0.6889  -0.689  0.52141
## factor(x\$variety)2  -0.6747     0.6889  -0.979  0.37234
## factor(x\$kind)1      0.6586     0.6889   0.956  0.38293
## factor(x\$kind)2     -1.5172     0.7944  -1.910  0.11440
## factor(x\$aroma)1     0.6293     0.5093   1.236  0.27150
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.78 on 5 degrees of freedom
## Multiple R-squared:  0.8184, Adjusted R-squared:  0.5642
## F-statistic:  3.22 on 7 and 5 DF,  p-value: 0.1082
#returning vector of utilities for all 12 attribute levels (intercept is first)
caUtilities(y=tprefm[1,], x=tprof, z=tlevn)
##
## Call:
## lm(formula = frml)
##
## Residuals:
##       1       2       3       4       5       6       7       8       9
##  1,1345 -1,4897  0,3103 -0,2655  0,3103  0,1931  1,5931 -1,4310 -1,4310
##      10      11      12      13
##  1,1207  0,3690  1,1931 -1,6069
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)          3,3937     0,5439   6,240  0,00155 **
## factor(x\$price)1    -1,5172     0,7944  -1,910  0,11440
## factor(x\$price)2    -1,1414     0,6889  -1,657  0,15844
## factor(x\$variety)1  -0,4747     0,6889  -0,689  0,52141
## factor(x\$variety)2  -0,6747     0,6889  -0,979  0,37234
## factor(x\$kind)1      0,6586     0,6889   0,956  0,38293
## factor(x\$kind)2     -1,5172     0,7944  -1,910  0,11440
## factor(x\$aroma)1     0,6293     0,5093   1,236  0,27150
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 1,78 on 5 degrees of freedom
## Multiple R-squared:  0.8184, Adjusted R-squared:  0.5642
## F-statistic:  3.22 on 7 and 5 DF,  p-value: 0,1082
##  [1]  3.3936782 -1.5172414 -1.1413793  2.6586207 -0.4747126 -0.6747126
##  [7]  1.1494253  0.6586207 -1.5172414  0.8586207  0.6293103 -0.6293103
#calculating individual part-worth utilities for the first 6 respondents
##      intercept    low medium   high  black  green    red   bags granulated
## [1,]     3.394 -1.517 -1.141  2.659 -0.475 -0.675  1.149  0.659     -1.517
## [2,]     5.049  3.391 -0.695 -2.695 -1.029  0.971  0.057  1.105     -0.609
## [3,]     4.029  2.563 -1.182 -1.382 -0.248  2.352 -2.103 -0.382     -2.437
## [4,]     5.856 -1.149 -0.025  1.175 -0.492  1.308 -0.816 -0.825     -0.149
## [5,]     6.250 -2.333  2.567 -0.233 -0.033 -0.633  0.667 -0.233     -0.333
## [6,]     1.578 -0.713 -0.144  0.856  1.456 -0.744 -0.713  0.656     -0.713
##       leafy    yes     no
## [1,]  0.859  0.629 -0.629
## [2,] -0.495 -0.681  0.681
## [3,]  2.818  0.776 -0.776
## [4,]  0.975  0.121 -0.121
## [5,]  0.567 -1.250  1.250
## [6,]  0.056  1.595 -1.595
#estimate parameters for entire sample of 100 - (assume sample is homogeneous)
#z = matrix of level names
#this function gives us average importance of factors as well
Conjoint(y=tpref, x=tprof, z=tlevn)
##
## Call:
## lm(formula = frml)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5,1888 -2,3761 -0,7512  2,2128  7,5134
##
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)         3,55336    0,09068  39,184  < 2e-16 ***
## factor(x\$price)1    0,24023    0,13245   1,814    0,070 .
## factor(x\$price)2   -0,14311    0,11485  -1,246    0,213
## factor(x\$variety)1  0,61489    0,11485   5,354 1,02e-07 ***
## factor(x\$variety)2  0,03489    0,11485   0,304    0,761
## factor(x\$kind)1     0,13689    0,11485   1,192    0,234
## factor(x\$kind)2    -0,88977    0,13245  -6,718 2,76e-11 ***
## factor(x\$aroma)1    0,41078    0,08492   4,837 1,48e-06 ***
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 2,967 on 1292 degrees of freedom
## Multiple R-squared:  0.09003,    Adjusted R-squared:  0.0851
## F-statistic: 18.26 on 7 and 1292 DF,  p-value: < 2,2e-16
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
##        levnms    utls
## 1   intercept  3,5534
## 2         low  0,2402
## 3      medium -0,1431
## 4        high -0,0971
## 5       black  0,6149
## 6       green  0,0349
## 7         red -0,6498
## 8        bags  0,1369
## 9  granulated -0,8898
## 10      leafy  0,7529
## 11        yes  0,4108
## 12         no -0,4108
## [1] "Average importance of factors (attributes):"
## [1] 24,76 32,22 27,15 15,88
## [1] Sum of average importance:  100,01
## [1] "Chart of average factors importance"

### Segmentation

Assuming the sample is not homogeneous, we can also rate respondents on three or N clusters using the K-means method of clustering. The necessary function is caSegmentation():

library(cluster)
segments <- caSegmentation(y=tpref, x=tprof, c=3)
print(segments)
## K-means clustering with 3 clusters of sizes 29, 31, 40
##
## Cluster means:
##       [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## 1 4.808000 5.070759 2.767310 7.132138 6.843172 2.649483 3.656379 1.539724
## 2 3.330226 5.582000 5.214258 4.207645 3.859419 4.740871 5.173129 5.334710
## 3 5.480275 2.938100 1.368100 4.540275 1.973100 3.782900 1.382900 0.965750
##       [,9]    [,10]    [,11]    [,12]    [,13]
## 1 2.063862 1.030862 6.691448 5.980517 6.801207
## 2 3.366968 4.838194 4.612129 6.050548 5.108613
## 3 2.820750 0.111225 3.450750 0.442900 0.692900
##
## Clustering vector:
##   [1] 1 2 1 2 2 3 1 2 1 1 1 1 3 3 3 3 2 3 2 3 3 1 3 2 2 1 2 2 2 2 3 1 2 1 1
##  [36] 1 1 3 3 3 3 2 3 2 3 1 1 3 3 3 1 3 3 3 2 1 3 2 3 2 3 3 1 2 2 1 3 3 3 2
##  [71] 1 3 1 2 1 2 2 3 1 1 2 2 2 1 3 3 3 3 2 3 2 3 2 3 3 1 3 2 1 1
##
## Within cluster sum of squares by cluster:
## [1] 1540.596 2316.512 1605.654
##  (between_SS / total_SS =  41.0 %)
##
## Available components:
##
## [1] "cluster"      "centers"      "totss"        "withinss"
## [5] "tot.withinss" "betweenss"    "size"         "iter"
## [9] "ifault"